用于机器学习的 Databricks Runtime 15.3

用于机器学习的 Databricks Runtime 15.3 基于 Databricks Runtime 15.3,为机器学习和数据科学提供了随时可用的环境。 Databricks Runtime ML 包含许多常用的机器学习库,包括 TensorFlow、PyTorch 和 XGBoost。 Databricks Runtime ML 包括 AutoML 工具,可用于自动训练机器学习管道。 Databricks Runtime ML 还支持使用 Horovod 进行分布式深度学习训练。

新增功能和改进

Databricks Runtime 15.3 ML 基于 Databricks Runtime 15.3 构建。 要了解 Databricks Runtime 15.3 中的新增功能(包括 Apache Spark MLlib 和 SparkR),请参阅 Databricks Runtime 15.3 发行说明。

Databricks AutoML 手动数据拆分和样本权重

AutoML 现在支持手动数据拆分,允许为分类和回归模型指定逐行训练、验证和测试数据集。 请参阅将数据拆分为训练、验证和测试集

AutoML 现在支持样本权重,你可以在训练回归模型期间调整每行的重要性。 有关详细信息,请参阅 AutoML Python API 的回归参数

系统环境

Databricks Runtime 15.3 ML 中的系统环境在以下方面不同于 Databricks Runtime 15.3:

  • 对于 GPU 群集,Databricks Runtime ML 包含以下 NVIDIA GPU 库:
    • CUDA 12.1
    • cusolver 11.4.5.107-1
    • cupti 12.1
    • cuDNN 8.9.0.131-1
    • NCCL 2.17.1
    • TensorRT 8.6.1.6-1

Libraries

以下部分列出了 Databricks Runtime 15.3 ML 中包含的库,这些库不同于 Databricks Runtime 15.3 中包含的库。

本节内容:

顶层库

Databricks Runtime 15.3 ML 包含以下顶层

Python 库

Databricks Runtime 15.3 ML 使用 virtualenv 进行 Python 包管理,包含许多常用的 ML 包。

除了以下部分指定的包,Databricks Runtime 15.3 ML 还包含以下包:

  • hyperopt 0.2.7+db3
  • sparkdl 3.0.0_db1
  • automl 1.27.0

若要在本地 Python 虚拟环境中重现 Databricks Runtime ML Python 环境,请下载 requirements-15.3.txt 文件并运行 pip install -r requirements-15.3.txt。 此命令安装 Databricks Runtime ML 使用的所有开源库,但不安装 Databricks 开发的库,例如 databricks-automldatabricks-feature-engineering,或 hyperopt 的 Databricks 分支。

CPU 群集上的 Python 库

版本 版本 版本
absl-py 1.0.0 accelerate 0.30.1 aiohttp 3.8.5
aiohttp-cors 0.7.0 aiosignal 1.2.0 anyio 3.5.0
argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1
asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.2
attrs 22.1.0 audioread 3.0.1 azure-core 1.30.1
azure-cosmos 4.3.1 azure-identity 1.16.0 azure-storage-blob 12.19.1
azure-storage-file-datalake 12.14.0 backcall 0.2.0 bcrypt 3.2.0
beautifulsoup4 4.12.2 black 23.3.0 bleach 4.1.0
blinker 1.4 blis 0.7.11 boto3 1.34.39
botocore 1.34.39 Brotli 1.0.9 cachetools 5.3.3
catalogue 2.0.10 category-encoders 2.6.3 certifi 2023.7.22
cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.4
circuitbreaker 1.4.0 单击 8.0.4 cloudpathlib 0.16.0
cloudpickle 2.2.1 cmdstanpy 1.2.2 五彩缤纷 0.5.6
comm 0.1.2 confection 0.1.4 configparser 5.2.0
contourpy 1.0.5 密码系统 41.0.3 cycler 0.11.0
cymem 2.0.8 Cython 0.29.32 dacite 1.8.1
databricks-automl-runtime 0.2.21 databricks-feature-engineering 0.5.0 databricks-sdk 0.20.0
dataclasses-json 0.6.6 datasets 2.19.1 dbl-tempo 0.1.26
dbu-python 1.2.18 debugpy 1.6.7 decorator 5.1.1
deepspeed 0.14.0 defusedxml 0.7.1 dill 0.3.6
diskcache 5.6.3 distlib 0.3.8 dm-tree 0.1.8
entrypoints 0.4 评估 0.4.2 正在执行 0.8.3
facets-overview 1.1.1 Farama-Notifications 0.0.4 fastjsonschema 2.19.1
fasttext 0.9.2 filelock 3.13.4 Flask 2.2.5
flatbuffers 24.3.25 fonttools 4.25.0 frozenlist 1.3.3
fsspec 2023.5.0 future 0.18.3 gast 0.4.0
gitdb 4.0.11 GitPython 3.1.27 google-api-core 2.18.0
google-auth 2.21.0 google-auth-oauthlib 1.0.0 google-cloud-core 2.4.1
google-cloud-storage 2.10.0 google-crc32c 1.5.0 google-pasta 0.2.0
google-resumable-media 2.7.0 googleapis-common-protos 1.63.0 greenlet 2.0.1
grpcio 1.60.0 grpcio-status 1.60.0 gunicorn 20.1.0
gviz-api 1.10.0 gymnasium 0.28.1 h11 0.14.0
h5py 3.10.0 hjson 3.1.0 holidays 0.45
horovod 0.28.1+db1 htmlmin 0.1.12 httpcore 1.0.5
httplib2 0.20.2 httpx 0.27.0 huggingface-hub 0.21.2
idna 3.4 ImageHash 4.3.1 imageio 2.31.1
imbalanced-learn 0.11.0 importlib-metadata 6.0.0 importlib_resources 6.4.0
ipyflow-core 0.0.198 ipykernel 6.25.1 ipython 8.15.0
ipython-genutils 0.2.0 ipywidgets 7.7.2 isodate 0.6.1
itsdangerous 2.0.1 jax-jumpy 1.0.0 jedi 0.18.1
jeepney 0.7.1 Jinja2 3.1.2 jmespath 0.10.0
joblib 1.2.0 joblibspark 0.5.1 jsonpatch 1.33
jsonpointer 2.4 jsonschema 4.17.3 jupyter-server 1.23.4
jupyter_client 7.4.9 jupyter_core 5.3.0 jupyterlab-pygments 0.1.2
keras 3.1.1 keyring 23.5.0 kiwisolver 1.4.4
langchain 0.1.20 langchain-community 0.0.38 langchain-core 0.1.52
langchain-text-splitters 0.0.2 langcodes 3.4.0 langsmith 0.1.63
language_data 1.2.0 launchpadlib 1.10.16 lazr.restfulclient 0.14.4
lazr.uri 1.0.6 lazy_loader 0.2 libclang 15.0.6.1
librosa 0.10.1 lightgbm 4.3.0 linkify-it-py 2.0.0
llvmlite 0.40.0 lxml 4.9.2 lz4 4.3.2
Mako 1.2.0 marisa-trie 1.1.1 Markdown 3.4.1
markdown-it-py 2.2.0 MarkupSafe 2.1.1 marshmallow 3.21.2
matplotlib 3.7.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.0
mdurl 0.1.0 memray 1.12.0 mistune 0.8.4
ml-dtypes 0.3.2 mlflow-skinny 2.11.3 more-itertools 8.10.0
mosaicml-streaming 0.7.4 mpmath 1.3.0 msal 1.28.0
msal-extensions 1.1.0 msgpack 1.0.8 multidict 6.0.2
multimethod 1.11.2 multiprocess 0.70.14 murmurhash 1.0.10
mypy-extensions 0.4.3 namex 0.0.8 nbclassic 0.5.5
nbclient 0.5.13 nbconvert 6.5.4 nbformat 5.7.0
nest-asyncio 1.5.6 networkx 3.1 ninja 1.11.1.1
nltk 3.8.1 笔记本 6.5.4 notebook_shim 0.2.2
numba 0.57.1 numpy 1.23.5 oauthlib 3.2.0
oci 2.126.4 openai 1.29.0 opencensus 0.11.4
opencensus-context 0.1.3 opt-einsum 3.3.0 optree 0.11.0
orjson 3.10.3 打包 23.2 pandas 1.5.3
pandocfilters 1.5.0 paramiko 3.4.0 parso 0.8.3
pathspec 0.10.3 patsy 0.5.3 petastorm 0.12.1
pexpect 4.8.0 phik 0.12.4 pickleshare 0.7.5
Pillow 9.4.0 pip 23.2.1 platformdirs 3.10.0
plotly 5.9.0 pmdarima 2.0.4 pooch 1.8.1
portalocker 2.8.2 preshed 3.0.9 prometheus-client 0.14.1
prompt-toolkit 3.0.36 prophet 1.1.5 proto-plus 1.23.0
protobuf 4.24.1 psutil 5.9.0 psycopg2 2.9.3
ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 8.0.0
py-spy 0.3.14 pyarrow 14.0.1 pyarrow-hotfix 0.6
pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.12.0
pyccolo 0.0.52 pycparser 2.21 pydantic 1.10.6
Pygments 2.15.1 PyGObject 3.42.1 PyJWT 2.3.0
PyNaCl 1.5.0 pynvml 11.5.0 pyodbc 4.0.38
pyOpenSSL 23.2.0 pyparsing 3.0.9 pyrsistent 0.18.0
pytesseract 0.3.10 python-dateutil 2.8.2 python-editor 1.0.4
python-lsp-jsonrpc 1.1.1 python-snappy 0.6.1 pytz 2022.7
PyWavelets 1.4.1 PyYAML 6.0 pyzmq 23.2.0
ray 2.12.0 regex 2022.7.9 请求 2.31.0
requests-oauthlib 1.3.1 rich 13.7.1 rsa 4.9
s3transfer 0.10.1 safetensors 0.4.2 scikit-image 0.20.0
scikit-learn 1.3.0 scipy 1.11.1 seaborn 0.12.2
SecretStorage 3.3.1 Send2Trash 1.8.0 sentence-transformers 2.7.0
sentencepiece 0.1.99 setuptools 68.0.0 shap 0.44.0
simplejson 3.17.6 6 1.16.0 slicer 0.0.7
smart-open 5.2.1 smmap 5.0.0 sniffio 1.2.0
soundfile 0.12.1 soupsieve 2.4 soxr 0.3.7
spacy 3.7.2 spacy-legacy 3.0.12 spacy-loggers 1.0.5
spark-tensorflow-distributor 1.0.0 SQLAlchemy 1.4.39 sqlparse 0.4.2
srsly 2.4.8 ssh-import-id 5.11 stack-data 0.2.0
stanio 0.5.0 statsmodels 0.14.0 sympy 1.11.1
tangled-up-in-unicode 0.2.0 tenacity 8.2.2 tensorboard 2.16.2
tensorboard-data-server 0.7.2 tensorboard_plugin_profile 2.15.1 tensorboardX 2.6.2.2
tensorflow 2.16.1 tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.37.0
termcolor 2.4.0 terminado 0.17.1 textual 0.63.3
tf_keras 2.16.0 thinc 8.2.3 threadpoolctl 2.2.0
tifffile 2021.7.2 tiktoken 0.5.2 tinycss2 1.2.1
tokenize-rt 4.2.1 tokenizers 0.19.0 torch 2.3.0+cpu
torcheval 0.0.7 torchvision 0.18.0+cpu tornado 6.3.2
tqdm 4.65.0 traitlets 5.7.1 transformers 4.40.2
typeguard 2.13.3 typer 0.9.4 typing-inspect 0.9.0
typing_extensions 4.10.0 tzdata 2022.1 uc-micro-py 1.0.1
ujson 5.4.0 unattended-upgrades 0.1 urllib3 1.26.16
virtualenv 20.24.2 visions 0.7.5 wadllib 1.3.6
wasabi 1.1.2 wcwidth 0.2.5 weasel 0.3.4
webencodings 0.5.1 websocket-client 0.58.0 Werkzeug 2.2.3
wheel 0.38.4 wordcloud 1.9.3 wrapt 1.14.1
xgboost 2.0.3 xxhash 3.4.1 yarl 1.8.1
ydata-profiling 4.5.1 zipp 3.11.0 zstd 1.5.5.1

GPU 群集上的 Python 库

版本 版本 版本
absl-py 1.0.0 accelerate 0.30.1 aiohttp 3.8.5
aiohttp-cors 0.7.0 aiosignal 1.2.0 anyio 3.5.0
argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1
asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.2
attrs 22.1.0 audioread 3.0.1 azure-core 1.30.1
azure-cosmos 4.3.1 azure-identity 1.16.0 azure-storage-blob 12.19.1
azure-storage-file-datalake 12.14.0 backcall 0.2.0 bcrypt 3.2.0
beautifulsoup4 4.12.2 black 23.3.0 bleach 4.1.0
blinker 1.4 blis 0.7.11 boto3 1.34.39
botocore 1.34.39 Brotli 1.0.9 cachetools 5.3.3
catalogue 2.0.10 category-encoders 2.6.3 certifi 2023.7.22
cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.4
circuitbreaker 1.4.0 单击 8.0.4 cloudpathlib 0.16.0
cloudpickle 2.2.1 cmdstanpy 1.2.2 五彩缤纷 0.5.6
comm 0.1.2 confection 0.1.4 configparser 5.2.0
contourpy 1.0.5 密码系统 41.0.3 cycler 0.11.0
cymem 2.0.8 Cython 0.29.32 dacite 1.8.1
databricks-automl-runtime 0.2.21 databricks-feature-engineering 0.5.0 databricks-sdk 0.20.0
dataclasses-json 0.6.6 datasets 2.19.1 dbl-tempo 0.1.26
dbu-python 1.2.18 debugpy 1.6.7 decorator 5.1.1
deepspeed 0.14.0 defusedxml 0.7.1 dill 0.3.6
diskcache 5.6.3 distlib 0.3.8 dm-tree 0.1.8
einops 0.8.0 entrypoints 0.4 评估 0.4.2
正在执行 0.8.3 facets-overview 1.1.1 Farama-Notifications 0.0.4
fastjsonschema 2.19.1 fasttext 0.9.2 filelock 3.13.4
flash-attn 2.5.8 Flask 2.2.5 flatbuffers 24.3.25
fonttools 4.25.0 frozenlist 1.3.3 fsspec 2023.5.0
future 0.18.3 gast 0.4.0 gitdb 4.0.11
GitPython 3.1.27 google-api-core 2.18.0 google-auth 2.21.0
google-auth-oauthlib 1.0.0 google-cloud-core 2.4.1 google-cloud-storage 2.10.0
google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.7.0
googleapis-common-protos 1.63.0 greenlet 2.0.1 grpcio 1.60.0
grpcio-status 1.60.0 gunicorn 20.1.0 gviz-api 1.10.0
gymnasium 0.28.1 h11 0.14.0 h5py 3.10.0
hjson 3.1.0 holidays 0.45 horovod 0.28.1+db1
htmlmin 0.1.12 httpcore 1.0.5 httplib2 0.20.2
httpx 0.27.0 huggingface-hub 0.21.2 idna 3.4
ImageHash 4.3.1 imageio 2.31.1 imbalanced-learn 0.11.0
importlib-metadata 6.0.0 importlib_resources 6.4.0 ipyflow-core 0.0.198
ipykernel 6.25.1 ipython 8.15.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 isodate 0.6.1 itsdangerous 2.0.1
jax-jumpy 1.0.0 jedi 0.18.1 jeepney 0.7.1
Jinja2 3.1.2 jmespath 0.10.0 joblib 1.2.0
joblibspark 0.5.1 jsonpatch 1.33 jsonpointer 2.4
jsonschema 4.17.3 jupyter-server 1.23.4 jupyter_client 7.4.9
jupyter_core 5.3.0 jupyterlab-pygments 0.1.2 keras 3.1.1
keyring 23.5.0 kiwisolver 1.4.4 langchain 0.1.20
langchain-community 0.0.38 langchain-core 0.1.52 langchain-text-splitters 0.0.2
langcodes 3.4.0 langsmith 0.1.63 language_data 1.2.0
launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6
lazy_loader 0.2 libclang 15.0.6.1 librosa 0.10.1
lightgbm 4.3.0 linkify-it-py 2.0.0 llvmlite 0.40.0
lxml 4.9.2 lz4 4.3.2 Mako 1.2.0
marisa-trie 1.1.1 Markdown 3.4.1 markdown-it-py 2.2.0
MarkupSafe 2.1.1 marshmallow 3.21.2 matplotlib 3.7.2
matplotlib-inline 0.1.6 mdit-py-plugins 0.3.0 mdurl 0.1.0
memray 1.12.0 mistune 0.8.4 ml-dtypes 0.3.2
mlflow-skinny 2.11.3 more-itertools 8.10.0 mosaicml-streaming 0.7.4
mpmath 1.3.0 msal 1.28.0 msal-extensions 1.1.0
msgpack 1.0.8 multidict 6.0.2 multimethod 1.11.2
multiprocess 0.70.14 murmurhash 1.0.10 mypy-extensions 0.4.3
namex 0.0.8 nbclassic 0.5.5 nbclient 0.5.13
nbconvert 6.5.4 nbformat 5.7.0 nest-asyncio 1.5.6
networkx 3.1 ninja 1.11.1.1 nltk 3.8.1
笔记本 6.5.4 notebook_shim 0.2.2 numba 0.57.1
numpy 1.23.5 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.0 oci 2.126.4
openai 1.29.0 opencensus 0.11.4 opencensus-context 0.1.3
opt-einsum 3.3.0 optree 0.11.0 orjson 3.10.3
打包 23.2 pandas 1.5.3 pandocfilters 1.5.0
paramiko 3.4.0 parso 0.8.3 pathspec 0.10.3
patsy 0.5.3 petastorm 0.12.1 pexpect 4.8.0
phik 0.12.4 pickleshare 0.7.5 Pillow 9.4.0
pip 23.2.1 platformdirs 3.10.0 plotly 5.9.0
pmdarima 2.0.4 pooch 1.8.1 portalocker 2.8.2
preshed 3.0.9 prometheus-client 0.14.1 prompt-toolkit 3.0.36
prophet 1.1.5 proto-plus 1.23.0 protobuf 4.24.1
psutil 5.9.0 psycopg2 2.9.3 ptyprocess 0.7.0
pure-eval 0.2.2 py-cpuinfo 8.0.0 py-spy 0.3.14
pyarrow 14.0.1 pyarrow-hotfix 0.6 pyasn1 0.4.8
pyasn1-modules 0.2.8 pybind11 2.12.0 pyccolo 0.0.52
pycparser 2.21 pydantic 1.10.6 Pygments 2.15.1
PyGObject 3.42.1 PyJWT 2.3.0 PyNaCl 1.5.0
pynvml 11.5.0 pyodbc 4.0.38 pyOpenSSL 23.2.0
pyparsing 3.0.9 pyrsistent 0.18.0 pytesseract 0.3.10
python-dateutil 2.8.2 python-editor 1.0.4 python-lsp-jsonrpc 1.1.1
python-snappy 0.6.1 pytz 2022.7 PyWavelets 1.4.1
PyYAML 6.0 pyzmq 23.2.0 ray 2.12.0
regex 2022.7.9 请求 2.31.0 requests-oauthlib 1.3.1
rich 13.7.1 rsa 4.9 s3transfer 0.10.1
safetensors 0.4.2 scikit-image 0.20.0 scikit-learn 1.3.0
scipy 1.11.1 seaborn 0.12.2 SecretStorage 3.3.1
Send2Trash 1.8.0 sentence-transformers 2.7.0 sentencepiece 0.1.99
setuptools 68.0.0 shap 0.44.0 simplejson 3.17.6
6 1.16.0 slicer 0.0.7 smart-open 5.2.1
smmap 5.0.0 sniffio 1.2.0 soundfile 0.12.1
soupsieve 2.4 soxr 0.3.7 spacy 3.7.2
spacy-legacy 3.0.12 spacy-loggers 1.0.5 spark-tensorflow-distributor 1.0.0
SQLAlchemy 1.4.39 sqlparse 0.4.2 srsly 2.4.8
ssh-import-id 5.11 stack-data 0.2.0 stanio 0.5.0
statsmodels 0.14.0 sympy 1.11.1 tangled-up-in-unicode 0.2.0
tenacity 8.2.2 tensorboard 2.16.2 tensorboard-data-server 0.7.2
tensorboard_plugin_profile 2.15.1 tensorboardX 2.6.2.2 tensorflow 2.16.1
tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.37.0 termcolor 2.4.0
terminado 0.17.1 textual 0.63.3 tf_keras 2.16.0
thinc 8.2.3 threadpoolctl 2.2.0 tifffile 2021.7.2
tiktoken 0.5.2 tinycss2 1.2.1 tokenize-rt 4.2.1
tokenizers 0.19.0 torch 2.3.0+cu121 torcheval 0.0.7
torchvision 0.18.0+cu121 tornado 6.3.2 tqdm 4.65.0
traitlets 5.7.1 transformers 4.40.2 triton 2.3.0
typeguard 2.13.3 typer 0.9.4 typing-inspect 0.9.0
typing_extensions 4.10.0 tzdata 2022.1 uc-micro-py 1.0.1
ujson 5.4.0 unattended-upgrades 0.1 urllib3 1.26.16
virtualenv 20.24.2 visions 0.7.5 wadllib 1.3.6
wasabi 1.1.2 wcwidth 0.2.5 weasel 0.3.4
webencodings 0.5.1 websocket-client 0.58.0 Werkzeug 2.2.3
wheel 0.38.4 wordcloud 1.9.3 wrapt 1.14.1
xgboost 2.0.3 xxhash 3.4.1 yarl 1.8.1
ydata-profiling 4.5.1 zipp 3.11.0 zstd 1.5.5.1

R 库

R 库与 Databricks Runtime 15.3 中的 R 库完全相同。

Java 库和 Scala 库(Scala 2.12 群集)

除了 Databricks Runtime 15.3 中的 Java 库和 Scala 库之外,Databricks Runtime 15.3 ML 还包含以下 JAR:

CPU 群集

组 ID 项目 ID 版本
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-spark_2.12 1.7.3
ml.dmlc xgboost4j_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.3-db1-spark3.5
org.mlflow mlflow-client 2.11.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

GPU 群集

组 ID 项目 ID 版本
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-gpu_2.12 1.7.3
ml.dmlc xgboost4j-spark-gpu_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.3-db1-spark3.5
org.mlflow mlflow-client 2.11.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0