Databricks Runtime 15.3 for Machine Learning

Databricks Runtime 15.3 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 15.3. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

Tip

To see release notes for Databricks Runtime versions that have reached end-of-support (EoS), see End-of-support Databricks Runtime release notes. The EoS Databricks Runtime versions have been retired and might not be updated.

New features and improvements

Databricks Runtime 15.3 ML is built on top of Databricks Runtime 15.3. For information on what's new in Databricks Runtime 15.3, including Apache Spark MLlib and SparkR, see the Databricks Runtime 15.3 release notes.

Mosaic AutoML manual data splits and sample weights

AutoML now supports manual data splits allowing you to specify row-by-row train, validate, and test datasets for classification and regression models. See Split data into train, validation, and test sets.

AutoML now supports sample weights, letting you adjust the importance of each row during regression model training. For more information, see the regression parameters for the AutoML Python API.

System environment

The system environment in Databricks Runtime 15.3 ML differs from Databricks Runtime 15.3 as follows:

  • For GPU clusters, Databricks Runtime ML includes the following NVIDIA GPU libraries:
    • CUDA 12.1
    • cusolver 11.4.5.107-1
    • cupti 12.1
    • cuDNN 8.9.0.131-1
    • NCCL 2.17.1
    • TensorRT 8.6.1.6-1

Libraries

The following sections list the libraries included in Databricks Runtime 15.3 ML that differ from those included in Databricks Runtime 15.3.

In this section:

Top-tier libraries

Databricks Runtime 15.3 ML includes the following top-tier libraries:

Python libraries

Databricks Runtime 15.3 ML uses virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 15.3 ML also includes the following packages:

  • hyperopt 0.2.7+db3
  • sparkdl 3.0.0_db1
  • automl 1.27.0

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-15.3.txt file and run pip install -r requirements-15.3.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install libraries developed by Databricks, such as databricks-automl, databricks-feature-engineering, or the Databricks fork of hyperopt.

Python libraries on CPU clusters

Library Version Library Version Library Version
absl-py 1.0.0 accelerate 0.30.1 aiohttp 3.8.5
aiohttp-cors 0.7.0 aiosignal 1.2.0 anyio 3.5.0
argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1
asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.2
attrs 22.1.0 audioread 3.0.1 azure-core 1.30.1
azure-cosmos 4.3.1 azure-identity 1.16.0 azure-storage-blob 12.19.1
azure-storage-file-datalake 12.14.0 backcall 0.2.0 bcrypt 3.2.0
beautifulsoup4 4.12.2 black 23.3.0 bleach 4.1.0
blinker 1.4 blis 0.7.11 boto3 1.34.39
botocore 1.34.39 Brotli 1.0.9 cachetools 5.3.3
catalogue 2.0.10 category-encoders 2.6.3 certifi 2023.7.22
cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.4
circuitbreaker 1.4.0 click 8.0.4 cloudpathlib 0.16.0
cloudpickle 2.2.1 cmdstanpy 1.2.2 colorful 0.5.6
comm 0.1.2 confection 0.1.4 configparser 5.2.0
contourpy 1.0.5 cryptography 41.0.3 cycler 0.11.0
cymem 2.0.8 Cython 0.29.32 dacite 1.8.1
databricks-automl-runtime 0.2.21 databricks-feature-engineering 0.5.0 databricks-sdk 0.20.0
dataclasses-json 0.6.6 datasets 2.19.1 dbl-tempo 0.1.26
dbus-python 1.2.18 debugpy 1.6.7 decorator 5.1.1
deepspeed 0.14.0 defusedxml 0.7.1 dill 0.3.6
diskcache 5.6.3 distlib 0.3.8 dm-tree 0.1.8
entrypoints 0.4 evaluate 0.4.2 executing 0.8.3
facets-overview 1.1.1 Farama-Notifications 0.0.4 fastjsonschema 2.19.1
fasttext 0.9.2 filelock 3.13.4 Flask 2.2.5
flatbuffers 24.3.25 fonttools 4.25.0 frozenlist 1.3.3
fsspec 2023.5.0 future 0.18.3 gast 0.4.0
gitdb 4.0.11 GitPython 3.1.27 google-api-core 2.18.0
google-auth 2.21.0 google-auth-oauthlib 1.0.0 google-cloud-core 2.4.1
google-cloud-storage 2.10.0 google-crc32c 1.5.0 google-pasta 0.2.0
google-resumable-media 2.7.0 googleapis-common-protos 1.63.0 greenlet 2.0.1
grpcio 1.60.0 grpcio-status 1.60.0 gunicorn 20.1.0
gviz-api 1.10.0 gymnasium 0.28.1 h11 0.14.0
h5py 3.10.0 hjson 3.1.0 holidays 0.45
horovod 0.28.1+db1 htmlmin 0.1.12 httpcore 1.0.5
httplib2 0.20.2 httpx 0.27.0 huggingface-hub 0.21.2
idna 3.4 ImageHash 4.3.1 imageio 2.31.1
imbalanced-learn 0.11.0 importlib-metadata 6.0.0 importlib_resources 6.4.0
ipyflow-core 0.0.198 ipykernel 6.25.1 ipython 8.15.0
ipython-genutils 0.2.0 ipywidgets 7.7.2 isodate 0.6.1
itsdangerous 2.0.1 jax-jumpy 1.0.0 jedi 0.18.1
jeepney 0.7.1 Jinja2 3.1.2 jmespath 0.10.0
joblib 1.2.0 joblibspark 0.5.1 jsonpatch 1.33
jsonpointer 2.4 jsonschema 4.17.3 jupyter-server 1.23.4
jupyter_client 7.4.9 jupyter_core 5.3.0 jupyterlab-pygments 0.1.2
keras 3.1.1 keyring 23.5.0 kiwisolver 1.4.4
langchain 0.1.20 langchain-community 0.0.38 langchain-core 0.1.52
langchain-text-splitters 0.0.2 langcodes 3.4.0 langsmith 0.1.63
language_data 1.2.0 launchpadlib 1.10.16 lazr.restfulclient 0.14.4
lazr.uri 1.0.6 lazy_loader 0.2 libclang 15.0.6.1
librosa 0.10.1 lightgbm 4.3.0 linkify-it-py 2.0.0
llvmlite 0.40.0 lxml 4.9.2 lz4 4.3.2
Mako 1.2.0 marisa-trie 1.1.1 Markdown 3.4.1
markdown-it-py 2.2.0 MarkupSafe 2.1.1 marshmallow 3.21.2
matplotlib 3.7.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.0
mdurl 0.1.0 memray 1.12.0 mistune 0.8.4
ml-dtypes 0.3.2 mlflow-skinny 2.11.3 more-itertools 8.10.0
mosaicml-streaming 0.7.4 mpmath 1.3.0 msal 1.28.0
msal-extensions 1.1.0 msgpack 1.0.8 multidict 6.0.2
multimethod 1.11.2 multiprocess 0.70.14 murmurhash 1.0.10
mypy-extensions 0.4.3 namex 0.0.8 nbclassic 0.5.5
nbclient 0.5.13 nbconvert 6.5.4 nbformat 5.7.0
nest-asyncio 1.5.6 networkx 3.1 ninja 1.11.1.1
nltk 3.8.1 notebook 6.5.4 notebook_shim 0.2.2
numba 0.57.1 numpy 1.23.5 oauthlib 3.2.0
oci 2.126.4 openai 1.29.0 opencensus 0.11.4
opencensus-context 0.1.3 opt-einsum 3.3.0 optree 0.11.0
orjson 3.10.3 packaging 23.2 pandas 1.5.3
pandocfilters 1.5.0 paramiko 3.4.0 parso 0.8.3
pathspec 0.10.3 patsy 0.5.3 petastorm 0.12.1
pexpect 4.8.0 phik 0.12.4 pickleshare 0.7.5
Pillow 9.4.0 pip 23.2.1 platformdirs 3.10.0
plotly 5.9.0 pmdarima 2.0.4 pooch 1.8.1
portalocker 2.8.2 preshed 3.0.9 prometheus-client 0.14.1
prompt-toolkit 3.0.36 prophet 1.1.5 proto-plus 1.23.0
protobuf 4.24.1 psutil 5.9.0 psycopg2 2.9.3
ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 8.0.0
py-spy 0.3.14 pyarrow 14.0.1 pyarrow-hotfix 0.6
pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.12.0
pyccolo 0.0.52 pycparser 2.21 pydantic 1.10.6
Pygments 2.15.1 PyGObject 3.42.1 PyJWT 2.3.0
PyNaCl 1.5.0 pynvml 11.5.0 pyodbc 4.0.38
pyOpenSSL 23.2.0 pyparsing 3.0.9 pyrsistent 0.18.0
pytesseract 0.3.10 python-dateutil 2.8.2 python-editor 1.0.4
python-lsp-jsonrpc 1.1.1 python-snappy 0.6.1 pytz 2022.7
PyWavelets 1.4.1 PyYAML 6.0 pyzmq 23.2.0
ray 2.12.0 regex 2022.7.9 requests 2.31.0
requests-oauthlib 1.3.1 rich 13.7.1 rsa 4.9
s3transfer 0.10.1 safetensors 0.4.2 scikit-image 0.20.0
scikit-learn 1.3.0 scipy 1.11.1 seaborn 0.12.2
SecretStorage 3.3.1 Send2Trash 1.8.0 sentence-transformers 2.7.0
sentencepiece 0.1.99 setuptools 68.0.0 shap 0.44.0
simplejson 3.17.6 six 1.16.0 slicer 0.0.7
smart-open 5.2.1 smmap 5.0.0 sniffio 1.2.0
soundfile 0.12.1 soupsieve 2.4 soxr 0.3.7
spacy 3.7.2 spacy-legacy 3.0.12 spacy-loggers 1.0.5
spark-tensorflow-distributor 1.0.0 SQLAlchemy 1.4.39 sqlparse 0.4.2
srsly 2.4.8 ssh-import-id 5.11 stack-data 0.2.0
stanio 0.5.0 statsmodels 0.14.0 sympy 1.11.1
tangled-up-in-unicode 0.2.0 tenacity 8.2.2 tensorboard 2.16.2
tensorboard-data-server 0.7.2 tensorboard_plugin_profile 2.15.1 tensorboardX 2.6.2.2
tensorflow 2.16.1 tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.37.0
termcolor 2.4.0 terminado 0.17.1 textual 0.63.3
tf_keras 2.16.0 thinc 8.2.3 threadpoolctl 2.2.0
tifffile 2021.7.2 tiktoken 0.5.2 tinycss2 1.2.1
tokenize-rt 4.2.1 tokenizers 0.19.0 torch 2.3.0+cpu
torcheval 0.0.7 torchvision 0.18.0+cpu tornado 6.3.2
tqdm 4.65.0 traitlets 5.7.1 transformers 4.40.2
typeguard 2.13.3 typer 0.9.4 typing-inspect 0.9.0
typing_extensions 4.10.0 tzdata 2022.1 uc-micro-py 1.0.1
ujson 5.4.0 unattended-upgrades 0.1 urllib3 1.26.16
virtualenv 20.24.2 visions 0.7.5 wadllib 1.3.6
wasabi 1.1.2 wcwidth 0.2.5 weasel 0.3.4
webencodings 0.5.1 websocket-client 0.58.0 Werkzeug 2.2.3
wheel 0.38.4 wordcloud 1.9.3 wrapt 1.14.1
xgboost 2.0.3 xxhash 3.4.1 yarl 1.8.1
ydata-profiling 4.5.1 zipp 3.11.0 zstd 1.5.5.1

Python libraries on GPU clusters

Library Version Library Version Library Version
absl-py 1.0.0 accelerate 0.30.1 aiohttp 3.8.5
aiohttp-cors 0.7.0 aiosignal 1.2.0 anyio 3.5.0
argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 astor 0.8.1
asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.2
attrs 22.1.0 audioread 3.0.1 azure-core 1.30.1
azure-cosmos 4.3.1 azure-identity 1.16.0 azure-storage-blob 12.19.1
azure-storage-file-datalake 12.14.0 backcall 0.2.0 bcrypt 3.2.0
beautifulsoup4 4.12.2 black 23.3.0 bleach 4.1.0
blinker 1.4 blis 0.7.11 boto3 1.34.39
botocore 1.34.39 Brotli 1.0.9 cachetools 5.3.3
catalogue 2.0.10 category-encoders 2.6.3 certifi 2023.7.22
cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.4
circuitbreaker 1.4.0 click 8.0.4 cloudpathlib 0.16.0
cloudpickle 2.2.1 cmdstanpy 1.2.2 colorful 0.5.6
comm 0.1.2 confection 0.1.4 configparser 5.2.0
contourpy 1.0.5 cryptography 41.0.3 cycler 0.11.0
cymem 2.0.8 Cython 0.29.32 dacite 1.8.1
databricks-automl-runtime 0.2.21 databricks-feature-engineering 0.5.0 databricks-sdk 0.20.0
dataclasses-json 0.6.6 datasets 2.19.1 dbl-tempo 0.1.26
dbus-python 1.2.18 debugpy 1.6.7 decorator 5.1.1
deepspeed 0.14.0 defusedxml 0.7.1 dill 0.3.6
diskcache 5.6.3 distlib 0.3.8 dm-tree 0.1.8
einops 0.8.0 entrypoints 0.4 evaluate 0.4.2
executing 0.8.3 facets-overview 1.1.1 Farama-Notifications 0.0.4
fastjsonschema 2.19.1 fasttext 0.9.2 filelock 3.13.4
flash-attn 2.5.8 Flask 2.2.5 flatbuffers 24.3.25
fonttools 4.25.0 frozenlist 1.3.3 fsspec 2023.5.0
future 0.18.3 gast 0.4.0 gitdb 4.0.11
GitPython 3.1.27 google-api-core 2.18.0 google-auth 2.21.0
google-auth-oauthlib 1.0.0 google-cloud-core 2.4.1 google-cloud-storage 2.10.0
google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.7.0
googleapis-common-protos 1.63.0 greenlet 2.0.1 grpcio 1.60.0
grpcio-status 1.60.0 gunicorn 20.1.0 gviz-api 1.10.0
gymnasium 0.28.1 h11 0.14.0 h5py 3.10.0
hjson 3.1.0 holidays 0.45 horovod 0.28.1+db1
htmlmin 0.1.12 httpcore 1.0.5 httplib2 0.20.2
httpx 0.27.0 huggingface-hub 0.21.2 idna 3.4
ImageHash 4.3.1 imageio 2.31.1 imbalanced-learn 0.11.0
importlib-metadata 6.0.0 importlib_resources 6.4.0 ipyflow-core 0.0.198
ipykernel 6.25.1 ipython 8.15.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 isodate 0.6.1 itsdangerous 2.0.1
jax-jumpy 1.0.0 jedi 0.18.1 jeepney 0.7.1
Jinja2 3.1.2 jmespath 0.10.0 joblib 1.2.0
joblibspark 0.5.1 jsonpatch 1.33 jsonpointer 2.4
jsonschema 4.17.3 jupyter-server 1.23.4 jupyter_client 7.4.9
jupyter_core 5.3.0 jupyterlab-pygments 0.1.2 keras 3.1.1
keyring 23.5.0 kiwisolver 1.4.4 langchain 0.1.20
langchain-community 0.0.38 langchain-core 0.1.52 langchain-text-splitters 0.0.2
langcodes 3.4.0 langsmith 0.1.63 language_data 1.2.0
launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6
lazy_loader 0.2 libclang 15.0.6.1 librosa 0.10.1
lightgbm 4.3.0 linkify-it-py 2.0.0 llvmlite 0.40.0
lxml 4.9.2 lz4 4.3.2 Mako 1.2.0
marisa-trie 1.1.1 Markdown 3.4.1 markdown-it-py 2.2.0
MarkupSafe 2.1.1 marshmallow 3.21.2 matplotlib 3.7.2
matplotlib-inline 0.1.6 mdit-py-plugins 0.3.0 mdurl 0.1.0
memray 1.12.0 mistune 0.8.4 ml-dtypes 0.3.2
mlflow-skinny 2.11.3 more-itertools 8.10.0 mosaicml-streaming 0.7.4
mpmath 1.3.0 msal 1.28.0 msal-extensions 1.1.0
msgpack 1.0.8 multidict 6.0.2 multimethod 1.11.2
multiprocess 0.70.14 murmurhash 1.0.10 mypy-extensions 0.4.3
namex 0.0.8 nbclassic 0.5.5 nbclient 0.5.13
nbconvert 6.5.4 nbformat 5.7.0 nest-asyncio 1.5.6
networkx 3.1 ninja 1.11.1.1 nltk 3.8.1
notebook 6.5.4 notebook_shim 0.2.2 numba 0.57.1
numpy 1.23.5 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.0 oci 2.126.4
openai 1.29.0 opencensus 0.11.4 opencensus-context 0.1.3
opt-einsum 3.3.0 optree 0.11.0 orjson 3.10.3
packaging 23.2 pandas 1.5.3 pandocfilters 1.5.0
paramiko 3.4.0 parso 0.8.3 pathspec 0.10.3
patsy 0.5.3 petastorm 0.12.1 pexpect 4.8.0
phik 0.12.4 pickleshare 0.7.5 Pillow 9.4.0
pip 23.2.1 platformdirs 3.10.0 plotly 5.9.0
pmdarima 2.0.4 pooch 1.8.1 portalocker 2.8.2
preshed 3.0.9 prometheus-client 0.14.1 prompt-toolkit 3.0.36
prophet 1.1.5 proto-plus 1.23.0 protobuf 4.24.1
psutil 5.9.0 psycopg2 2.9.3 ptyprocess 0.7.0
pure-eval 0.2.2 py-cpuinfo 8.0.0 py-spy 0.3.14
pyarrow 14.0.1 pyarrow-hotfix 0.6 pyasn1 0.4.8
pyasn1-modules 0.2.8 pybind11 2.12.0 pyccolo 0.0.52
pycparser 2.21 pydantic 1.10.6 Pygments 2.15.1
PyGObject 3.42.1 PyJWT 2.3.0 PyNaCl 1.5.0
pynvml 11.5.0 pyodbc 4.0.38 pyOpenSSL 23.2.0
pyparsing 3.0.9 pyrsistent 0.18.0 pytesseract 0.3.10
python-dateutil 2.8.2 python-editor 1.0.4 python-lsp-jsonrpc 1.1.1
python-snappy 0.6.1 pytz 2022.7 PyWavelets 1.4.1
PyYAML 6.0 pyzmq 23.2.0 ray 2.12.0
regex 2022.7.9 requests 2.31.0 requests-oauthlib 1.3.1
rich 13.7.1 rsa 4.9 s3transfer 0.10.1
safetensors 0.4.2 scikit-image 0.20.0 scikit-learn 1.3.0
scipy 1.11.1 seaborn 0.12.2 SecretStorage 3.3.1
Send2Trash 1.8.0 sentence-transformers 2.7.0 sentencepiece 0.1.99
setuptools 68.0.0 shap 0.44.0 simplejson 3.17.6
six 1.16.0 slicer 0.0.7 smart-open 5.2.1
smmap 5.0.0 sniffio 1.2.0 soundfile 0.12.1
soupsieve 2.4 soxr 0.3.7 spacy 3.7.2
spacy-legacy 3.0.12 spacy-loggers 1.0.5 spark-tensorflow-distributor 1.0.0
SQLAlchemy 1.4.39 sqlparse 0.4.2 srsly 2.4.8
ssh-import-id 5.11 stack-data 0.2.0 stanio 0.5.0
statsmodels 0.14.0 sympy 1.11.1 tangled-up-in-unicode 0.2.0
tenacity 8.2.2 tensorboard 2.16.2 tensorboard-data-server 0.7.2
tensorboard_plugin_profile 2.15.1 tensorboardX 2.6.2.2 tensorflow 2.16.1
tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.37.0 termcolor 2.4.0
terminado 0.17.1 textual 0.63.3 tf_keras 2.16.0
thinc 8.2.3 threadpoolctl 2.2.0 tifffile 2021.7.2
tiktoken 0.5.2 tinycss2 1.2.1 tokenize-rt 4.2.1
tokenizers 0.19.0 torch 2.3.0+cu121 torcheval 0.0.7
torchvision 0.18.0+cu121 tornado 6.3.2 tqdm 4.65.0
traitlets 5.7.1 transformers 4.40.2 triton 2.3.0
typeguard 2.13.3 typer 0.9.4 typing-inspect 0.9.0
typing_extensions 4.10.0 tzdata 2022.1 uc-micro-py 1.0.1
ujson 5.4.0 unattended-upgrades 0.1 urllib3 1.26.16
virtualenv 20.24.2 visions 0.7.5 wadllib 1.3.6
wasabi 1.1.2 wcwidth 0.2.5 weasel 0.3.4
webencodings 0.5.1 websocket-client 0.58.0 Werkzeug 2.2.3
wheel 0.38.4 wordcloud 1.9.3 wrapt 1.14.1
xgboost 2.0.3 xxhash 3.4.1 yarl 1.8.1
ydata-profiling 4.5.1 zipp 3.11.0 zstd 1.5.5.1

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 15.3.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 15.3, Databricks Runtime 15.3 ML contains the following JARs:

CPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-spark_2.12 1.7.3
ml.dmlc xgboost4j_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.3-db1-spark3.5
org.mlflow mlflow-client 2.11.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

GPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-gpu_2.12 1.7.3
ml.dmlc xgboost4j-spark-gpu_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.3-db1-spark3.5
org.mlflow mlflow-client 2.11.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0