Databricks Runtime 12.1 for Machine Learning (EoS)

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

Databricks Runtime 12.1 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 12.1 (EoS). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see AI and machine learning on Databricks.

New features and improvements

Databricks Runtime 12.1 ML is built on top of Databricks Runtime 12.1. For information on what's new in Databricks Runtime 12.1, including Apache Spark MLlib and SparkR, see the Databricks Runtime 12.1 (EoS) release notes.

Databricks AutoML

Starting with Databricks Runtime 12.1 ML, the AutoML Python API allows you to specify a custom name for the experiment generated by AutoML. Use the parameter experiment_name.

For more information about Databricks AutoML, see What is AutoML?.

System environment

The system environment in Databricks Runtime 12.1 ML differs from Databricks Runtime 12.1 as follows:

Databricks Runtime 12.1 ML includes XGBoost 1.7.2, which does not support GPU clusters with compute capability 5.2 and below.

Libraries

The following sections list the libraries included in Databricks Runtime 12.1 ML that differ from those included in Databricks Runtime 12.1.

In this section:

Top-tier libraries

Databricks Runtime 12.1 ML includes the following top-tier libraries:

Python libraries

Databricks Runtime 12.1 ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 12.1 ML also includes the following packages:

  • hyperopt 0.2.7.db1
  • sparkdl 2.3.0-db3
  • automl 1.15.0

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-12.1.txt file and run pip install -r requirements-12.1.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install libraries developed by Databricks, such as databricks-automl, databricks-feature-store, or the Databricks fork of hyperopt.

Python libraries on CPU clusters

Library Version Library Version Library Version
absl-py 1.0.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0
astor 0.8.1 asttokens 2.0.5 astunparse 1.6.3
attrs 21.4.0 azure-core 1.26.1 azure-cosmos 4.2.0
backcall 0.2.0 backports.entry-points-selectable 1.2.0 bcrypt 3.2.0
beautifulsoup4 4.11.1 black 22.3.0 bleach 4.1.0
blis 0.7.9 boto3 1.21.32 botocore 1.24.32
cachetools 4.2.2 catalogue 2.0.8 category-encoders 2.5.1.post0
certifi 2021.10.8 cffi 1.15.0 chardet 4.0.0
charset-normalizer 2.0.4 click 8.0.4 cloudpickle 2.0.0
cmdstanpy 1.0.8 confection 0.0.3 configparser 5.2.0
convertdate 2.4.0 cryptography 3.4.8 cycler 0.11.0
cymem 2.0.7 Cython 0.29.28 databricks-automl-runtime 0.2.14
databricks-cli 0.17.4 databricks-feature-store 0.9.0 dbl-tempo 0.1.12
dbus-python 1.2.16 debugpy 1.5.1 decorator 5.1.1
defusedxml 0.7.1 dill 0.3.4 diskcache 5.4.0
distlib 0.3.6 docstring-to-markdown 0.11 entrypoints 0.4
ephem 4.1.4 executing 0.8.3 facets-overview 1.0.0
fastjsonschema 2.16.2 fasttext 0.9.2 filelock 3.6.0
Flask 1.1.2 flatbuffers 22.12.6 fonttools 4.25.0
fsspec 2022.2.0 future 0.18.2 gast 0.4.0
gitdb 4.0.10 GitPython 3.1.27 google-auth 1.33.0
google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.42.0
gunicorn 20.1.0 gviz-api 1.10.0 h5py 3.6.0
hijri-converter 2.2.4 holidays 0.17.2 horovod 0.26.1
htmlmin 0.1.12 huggingface-hub 0.11.1 idna 3.3
ImageHash 4.3.1 imbalanced-learn 0.8.1 importlib-metadata 4.11.3
ipykernel 6.15.3 ipython 8.5.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 isodate 0.6.1 itsdangerous 2.0.1
jedi 0.18.1 Jinja2 2.11.3 jmespath 0.10.0
joblib 1.1.0 joblibspark 0.5.0 jsonschema 4.4.0
jupyter-client 6.1.12 jupyter_core 4.11.2 jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.0 keras 2.10.0 Keras-Preprocessing 1.1.2
kiwisolver 1.3.2 korean-lunar-calendar 0.3.1 langcodes 3.3.0
libclang 14.0.6 lightgbm 3.3.3 llvmlite 0.38.0
LunarCalendar 0.0.9 Mako 1.2.0 Markdown 3.3.4
MarkupSafe 2.0.1 matplotlib 3.5.1 matplotlib-inline 0.1.2
mccabe 0.7.0 mistune 0.8.4 mleap 0.20.0
mlflow-skinny 2.1.1 multimethod 1.9.1 murmurhash 1.0.9
mypy-extensions 0.4.3 nbclient 0.5.13 nbconvert 6.4.4
nbformat 5.3.0 nest-asyncio 1.5.5 networkx 2.7.1
nltk 3.7 nodeenv 1.7.0 notebook 6.4.8
numba 0.55.1 numpy 1.21.5 oauthlib 3.2.0
opt-einsum 3.3.0 packaging 21.3 pandas 1.4.2
pandas-profiling 3.5.0 pandocfilters 1.5.0 paramiko 2.9.2
parso 0.8.3 pathspec 0.9.0 pathy 0.6.1
patsy 0.5.2 petastorm 0.12.0 pexpect 4.8.0
phik 0.12.3 pickleshare 0.7.5 Pillow 9.0.1
pip 21.2.4 platformdirs 2.6.0 plotly 5.6.0
pluggy 1.0.0 pmdarima 2.0.2 preshed 3.0.8
prometheus-client 0.13.1 prompt-toolkit 3.0.20 prophet 1.1.1
protobuf 3.19.4 psutil 5.8.0 psycopg2 2.9.3
ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 7.0.0
pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.10.1
pycparser 2.21 pydantic 1.10.2 pyflakes 2.5.0
Pygments 2.11.2 PyGObject 3.36.0 PyJWT 2.6.0
PyMeeus 0.5.12 PyNaCl 1.5.0 pyodbc 4.0.32
pyparsing 3.0.4 pyright 1.1.283 pyrsistent 0.18.0
python-dateutil 2.8.2 python-editor 1.0.4 python-lsp-jsonrpc 1.0.0
python-lsp-server 1.6.0 pytz 2021.3 PyWavelets 1.3.0
PyYAML 6.0 pyzmq 22.3.0 regex 2022.3.15
requests 2.27.1 requests-oauthlib 1.3.1 requests-unixsocket 0.2.0
rope 0.22.0 rsa 4.7.2 s3transfer 0.5.0
scikit-learn 1.0.2 scipy 1.7.3 seaborn 0.11.2
Send2Trash 1.8.0 setuptools 61.2.0 setuptools-git 1.2
shap 0.41.0 simplejson 3.17.6 six 1.16.0
slicer 0.0.7 smart-open 5.1.0 smmap 5.0.0
soupsieve 2.3.1 spacy 3.4.3 spacy-legacy 3.0.10
spacy-loggers 1.0.4 spark-tensorflow-distributor 1.0.0 sqlparse 0.4.2
srsly 2.4.5 ssh-import-id 5.10 stack-data 0.2.0
statsmodels 0.13.2 tabulate 0.8.9 tangled-up-in-unicode 0.2.0
tenacity 8.0.1 tensorboard 2.10.0 tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.8.0 tensorboard-plugin-wit 1.8.1 tensorflow-cpu 2.10.0
tensorflow-estimator 2.10.0 tensorflow-io-gcs-filesystem 0.29.0 termcolor 2.1.1
terminado 0.13.1 testpath 0.5.0 thinc 8.1.6
threadpoolctl 2.2.0 tokenize-rt 4.2.1 tokenizers 0.13.2
tomli 1.2.2 torch 1.13.0+cpu torchvision 0.14.0+cpu
tornado 6.1 tqdm 4.64.0 traitlets 5.1.1
transformers 4.25.1 typeguard 2.13.3 typer 0.7.0
typing_extensions 4.1.1 ujson 5.1.0 unattended-upgrades 0.1
urllib3 1.26.9 virtualenv 20.8.0 visions 0.7.5
wasabi 0.10.1 wcwidth 0.2.5 webencodings 0.5.1
websocket-client 0.58.0 Werkzeug 2.0.3 whatthepatch 1.0.3
wheel 0.37.1 widgetsnbextension 3.6.1 wrapt 1.12.1
xgboost 1.7.2 yapf 0.31.0 zipp 3.7.0

Python libraries on GPU clusters

Library Version Library Version Library Version
absl-py 1.0.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0
astor 0.8.1 asttokens 2.0.5 astunparse 1.6.3
attrs 21.4.0 azure-core 1.26.1 azure-cosmos 4.2.0
backcall 0.2.0 backports.entry-points-selectable 1.2.0 bcrypt 3.2.0
beautifulsoup4 4.11.1 black 22.3.0 bleach 4.1.0
blis 0.7.9 boto3 1.21.32 botocore 1.24.32
cachetools 4.2.2 catalogue 2.0.8 category-encoders 2.5.1.post0
certifi 2021.10.8 cffi 1.15.0 chardet 4.0.0
charset-normalizer 2.0.4 click 8.0.4 cloudpickle 2.0.0
cmdstanpy 1.0.8 confection 0.0.3 configparser 5.2.0
convertdate 2.4.0 cryptography 3.4.8 cycler 0.11.0
cymem 2.0.7 Cython 0.29.28 databricks-automl-runtime 0.2.14
databricks-cli 0.17.4 databricks-feature-store 0.9.0 dbl-tempo 0.1.12
dbus-python 1.2.16 debugpy 1.5.1 decorator 5.1.1
defusedxml 0.7.1 dill 0.3.4 diskcache 5.4.0
distlib 0.3.6 docstring-to-markdown 0.11 entrypoints 0.4
ephem 4.1.4 executing 0.8.3 facets-overview 1.0.0
fastjsonschema 2.16.2 fasttext 0.9.2 filelock 3.6.0
Flask 1.1.2 flatbuffers 22.12.6 fonttools 4.25.0
fsspec 2022.2.0 future 0.18.2 gast 0.4.0
gitdb 4.0.10 GitPython 3.1.27 google-auth 1.33.0
google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.42.0
gunicorn 20.1.0 gviz-api 1.10.0 h5py 3.6.0
hijri-converter 2.2.4 holidays 0.17.2 horovod 0.26.1
htmlmin 0.1.12 huggingface-hub 0.11.1 idna 3.3
ImageHash 4.3.1 imbalanced-learn 0.8.1 importlib-metadata 4.11.3
ipykernel 6.15.3 ipython 8.5.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 isodate 0.6.1 itsdangerous 2.0.1
jedi 0.18.1 Jinja2 2.11.3 jmespath 0.10.0
joblib 1.1.0 joblibspark 0.5.0 jsonschema 4.4.0
jupyter-client 6.1.12 jupyter_core 4.11.2 jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.0 keras 2.10.0 Keras-Preprocessing 1.1.2
kiwisolver 1.3.2 korean-lunar-calendar 0.3.1 langcodes 3.3.0
libclang 14.0.6 lightgbm 3.3.3 llvmlite 0.38.0
LunarCalendar 0.0.9 Mako 1.2.0 Markdown 3.3.4
MarkupSafe 2.0.1 matplotlib 3.5.1 matplotlib-inline 0.1.2
mccabe 0.7.0 mistune 0.8.4 mleap 0.20.0
mlflow-skinny 2.1.1 multimethod 1.9.1 murmurhash 1.0.9
mypy-extensions 0.4.3 nbclient 0.5.13 nbconvert 6.4.4
nbformat 5.3.0 nest-asyncio 1.5.5 networkx 2.7.1
nltk 3.7 nodeenv 1.7.0 notebook 6.4.8
numba 0.55.1 numpy 1.21.5 oauthlib 3.2.0
opt-einsum 3.3.0 packaging 21.3 pandas 1.4.2
pandas-profiling 3.5.0 pandocfilters 1.5.0 paramiko 2.9.2
parso 0.8.3 pathspec 0.9.0 pathy 0.6.1
patsy 0.5.2 petastorm 0.12.0 pexpect 4.8.0
phik 0.12.3 pickleshare 0.7.5 Pillow 9.0.1
pip 21.2.4 platformdirs 2.6.0 plotly 5.6.0
pluggy 1.0.0 pmdarima 2.0.2 preshed 3.0.8
prompt-toolkit 3.0.20 prophet 1.1.1 protobuf 3.19.4
psutil 5.8.0 psycopg2 2.9.3 ptyprocess 0.7.0
pure-eval 0.2.2 pyarrow 7.0.0 pyasn1 0.4.8
pyasn1-modules 0.2.8 pybind11 2.10.1 pycparser 2.21
pydantic 1.10.2 pyflakes 2.5.0 Pygments 2.11.2
PyGObject 3.36.0 PyJWT 2.6.0 PyMeeus 0.5.12
PyNaCl 1.5.0 pyodbc 4.0.32 pyparsing 3.0.4
pyright 1.1.283 pyrsistent 0.18.0 python-dateutil 2.8.2
python-editor 1.0.4 python-lsp-jsonrpc 1.0.0 python-lsp-server 1.6.0
pytz 2021.3 PyWavelets 1.3.0 PyYAML 6.0
pyzmq 22.3.0 regex 2022.3.15 requests 2.27.1
requests-oauthlib 1.3.1 requests-unixsocket 0.2.0 rope 0.22.0
rsa 4.7.2 s3transfer 0.5.0 scikit-learn 1.0.2
scipy 1.7.3 seaborn 0.11.2 Send2Trash 1.8.0
setuptools 61.2.0 setuptools-git 1.2 shap 0.41.0
simplejson 3.17.6 six 1.16.0 slicer 0.0.7
smart-open 5.1.0 smmap 5.0.0 soupsieve 2.3.1
spacy 3.4.3 spacy-legacy 3.0.10 spacy-loggers 1.0.4
spark-tensorflow-distributor 1.0.0 sqlparse 0.4.2 srsly 2.4.5
ssh-import-id 5.10 stack-data 0.2.0 statsmodels 0.13.2
tabulate 0.8.9 tangled-up-in-unicode 0.2.0 tenacity 8.0.1
tensorboard 2.10.0 tensorboard-data-server 0.6.1 tensorboard-plugin-profile 2.8.0
tensorboard-plugin-wit 1.8.1 tensorflow 2.10.0 tensorflow-estimator 2.10.0
tensorflow-io-gcs-filesystem 0.29.0 termcolor 2.1.1 terminado 0.13.1
testpath 0.5.0 thinc 8.1.6 threadpoolctl 2.2.0
tokenize-rt 4.2.1 tokenizers 0.13.2 tomli 1.2.2
torch 1.13.0+cu117 torchvision 0.14.0+cu117 tornado 6.1
tqdm 4.64.0 traitlets 5.1.1 transformers 4.25.1
typeguard 2.13.3 typer 0.7.0 typing_extensions 4.1.1
ujson 5.1.0 unattended-upgrades 0.1 urllib3 1.26.9
virtualenv 20.8.0 visions 0.7.5 wasabi 0.10.1
wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.58.0
Werkzeug 2.0.3 whatthepatch 1.0.3 wheel 0.37.1
widgetsnbextension 3.6.1 wrapt 1.12.1 xgboost 1.7.2
yapf 0.31.0 zipp 3.7.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 12.1.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 12.1, Databricks Runtime 12.1 ML contains the following JARs:

CPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 v0.20.0-db1
ml.dmlc xgboost4j-spark_2.12 1.6.2
ml.dmlc xgboost4j_2.12 1.6.2
org.graphframes graphframes_2.12 0.8.2-db1-spark3.2
org.mlflow mlflow-client 2.0.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

GPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 v0.20.0-db1
ml.dmlc xgboost4j-gpu_2.12 1.6.2
ml.dmlc xgboost4j-spark-gpu_2.12 1.6.2
org.graphframes graphframes_2.12 0.8.2-db1-spark3.2
org.mlflow mlflow-client 2.0.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0