Databricks Runtime 11.3 LTS for Machine Learning

Databricks Runtime 11.3 LTS for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 11.3 LTS. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

Note

LTS means this version is under long-term support. See Databricks Runtime LTS version lifecycle.

For more information, including instructions for creating a Databricks Runtime ML cluster, see AI and machine learning on Databricks.

Tip

To see release notes for Databricks Runtime versions that have reached end-of-support (EoS), see End-of-support Databricks Runtime release notes. The EoS Databricks Runtime versions have been retired and might not be updated.

New features and improvements

Databricks Runtime 11.3 LTS ML is built on top of Databricks Runtime 11.3 LTS. For information on what's new in Databricks Runtime 11.3 LTS, including Apache Spark MLlib and SparkR, see the Databricks Runtime 11.3 LTS release notes.

Enhancements to Mosaic AutoML

Mosaic AutoML now supports the use of existing Feature Store feature tables in your AutoML experiments. For details, see Feature Store integration.

Trial notebooks generated by AutoML now contain code snippets that enable users to rerun hyperparameter tuning.

AutoML now supports DecimalType features.

Bug fixes

Databricks Runtime 11.3 LTS ML includes an upgraded version of sparkdl.xgboost. Previous versions of sparkdl.xgboost contain bugs that are fixed in this release, so Databricks recommends that users of the library upgrade to Databricks Runtime 11.3 LTS ML.

Prepare for future releases

An upcoming release of Databricks Runtime ML will include sklearn version 1.0. Visit the sklearn documentation for information on how to prepare for this change.

Databricks Runtime ML contains two openblas packages. The /opt/OpenBLAS package is deprecated in Databricks Runtime 11.3 LTS ML and will be removed in an upcoming release.

System environment

The system environment in Databricks Runtime 11.3 LTS ML differs from Databricks Runtime 11.3 LTS as follows:

Databricks Runtime 11.3 LTS ML includes XGBoost 1.6.1, which does not support GPU clusters with compute capability 5.2 and below.

Libraries

The following sections list the libraries included in Databricks Runtime 11.3 LTS ML that differ from those included in Databricks Runtime 11.3 LTS.

In this section:

Top-tier libraries

Databricks Runtime 11.3 LTS ML includes the following top-tier libraries:

Python libraries

Databricks Runtime 11.3 LTS ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the following sections, Databricks Runtime 11.3 LTS ML also includes the following packages:

  • hyperopt 0.2.7.db1
  • sparkdl 2.3.0-db3
  • feature_store 0.7.0
  • automl 1.13.2

To reproduce the Databricks Runtime ML Python environment in your local Python virtual environment, download the requirements-11.3.txt file and run pip install -r requirements-11.3.txt. This command installs all of the open source libraries that Databricks Runtime ML uses, but does not install libraries developed by Databricks, such as databricks-automl, databricks-feature-store, or the Databricks fork of hyperopt.

Python libraries on CPU clusters

Library Version Library Version Library Version
absl-py 1.0.0 argon2-cffi 20.1.0 astor 0.8.1
astunparse 1.6.3 async-generator 1.10 attrs 21.2.0
azure-core 1.22.1 azure-cosmos 4.2.0 backcall 0.2.0
backports.entry-points-selectable 1.1.1 bcrypt 4.0.0 black 22.3.0
bleach 4.0.0 blis 0.7.8 boto3 1.21.18
botocore 1.24.18 cachetools 5.2.0 catalogue 2.0.8
certifi 2021.10.8 cffi 1.14.6 chardet 4.0.0
charset-normalizer 2.0.4 click 8.0.3 cloudpickle 2.0.0
cmdstanpy 0.9.68 confection 0.0.1 configparser 5.2.0
convertdate 2.4.0 cryptography 3.4.8 cycler 0.10.0
cymem 2.0.6 Cython 0.29.24 databricks-automl-runtime 0.2.11
databricks-cli 0.17.3 dbl-tempo 0.1.12 dbus-python 1.2.16
debugpy 1.4.1 decorator 5.1.0 defusedxml 0.7.1
dill 0.3.4 diskcache 5.4.0 distlib 0.3.6
entrypoints 0.3 ephem 4.1.3 facets-overview 1.0.0
fasttext 0.9.2 filelock 3.3.1 Flask 1.1.2
flatbuffers 1.12 fsspec 2021.8.1 future 0.18.2
gast 0.4.0 gitdb 4.0.9 GitPython 3.1.27
google-auth 2.6.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0
grpcio 1.44.0 gunicorn 20.1.0 gviz-api 1.10.0
h5py 3.3.0 hijri-converter 2.2.4 holidays 0.15
horovod 0.25.0 htmlmin 0.1.12 huggingface-hub 0.9.1
idna 3.2 ImageHash 4.3.0 imbalanced-learn 0.8.1
importlib-metadata 4.8.1 ipykernel 6.12.1 ipython 7.32.0
ipython-genutils 0.2.0 ipywidgets 7.7.0 isodate 0.6.1
itsdangerous 2.0.1 jedi 0.18.0 Jinja2 2.11.3
jmespath 0.10.0 joblib 1.0.1 joblibspark 0.5.0
jsonschema 3.2.0 jupyter-client 6.1.12 jupyter-core 4.8.1
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.9.0
Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 korean-lunar-calendar 0.3.1
langcodes 3.3.0 libclang 14.0.6 lightgbm 3.3.2
llvmlite 0.37.0 LunarCalendar 0.0.9 Mako 1.2.0
Markdown 3.3.6 MarkupSafe 2.0.1 matplotlib 3.4.3
matplotlib-inline 0.1.2 missingno 0.5.1 mistune 0.8.4
mleap 0.20.0 mlflow-skinny 1.29.0 multimethod 1.9
murmurhash 1.0.8 mypy-extensions 0.4.3 nbclient 0.5.3
nbconvert 6.1.0 nbformat 5.1.3 nest-asyncio 1.5.1
networkx 2.6.3 nltk 3.6.5 notebook 6.4.5
numba 0.54.1 numpy 1.20.3 oauthlib 3.2.0
opt-einsum 3.3.0 packaging 21.0 pandas 1.3.4
pandas-profiling 3.1.0 pandocfilters 1.4.3 paramiko 2.9.2
parso 0.8.2 pathspec 0.9.0 pathy 0.6.2
patsy 0.5.2 petastorm 0.11.4 pexpect 4.8.0
phik 0.12.2 pickleshare 0.7.5 Pillow 8.4.0
pip 21.2.4 platformdirs 2.5.2 plotly 5.9.0
pmdarima 1.8.5 preshed 3.0.7 prometheus-client 0.11.0
prompt-toolkit 3.0.20 prophet 1.0.1 protobuf 3.19.4
psutil 5.8.0 psycopg2 2.9.3 ptyprocess 0.7.0
pyarrow 7.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8
pybind11 2.10.0 pycparser 2.20 pydantic 1.9.2
Pygments 2.10.0 PyGObject 3.36.0 PyJWT 2.5.0
PyMeeus 0.5.11 PyNaCl 1.5.0 pyodbc 4.0.31
pyparsing 3.0.4 pyrsistent 0.18.0 pystan 2.19.1.1
python-dateutil 2.8.2 python-editor 1.0.4 pytz 2021.3
PyWavelets 1.1.1 PyYAML 6.0 pyzmq 22.2.1
regex 2021.8.3 requests 2.26.0 requests-oauthlib 1.3.1
requests-unixsocket 0.2.0 rsa 4.9 s3transfer 0.5.2
scikit-learn 0.24.2 scipy 1.7.1 seaborn 0.11.3
Send2Trash 1.8.0 setuptools 58.0.4 setuptools-git 1.2
shap 0.41.0 simplejson 3.17.6 six 1.16.0
slicer 0.0.7 smart-open 5.2.1 smmap 5.0.0
spacy 3.4.1 spacy-legacy 3.0.10 spacy-loggers 1.0.3
spark-tensorflow-distributor 1.0.0 sqlparse 0.4.2 srsly 2.4.4
ssh-import-id 5.10 statsmodels 0.12.2 tabulate 0.8.9
tangled-up-in-unicode 0.1.0 tenacity 8.0.1 tensorboard 2.9.1
tensorboard-data-server 0.6.1 tensorboard-plugin-profile 2.8.0 tensorboard-plugin-wit 1.8.1
tensorflow-cpu 2.9.1 tensorflow-estimator 2.9.0 tensorflow-io-gcs-filesystem 0.27.0
termcolor 2.0.1 terminado 0.9.4 testpath 0.5.0
thinc 8.1.2 threadpoolctl 2.2.0 tokenize-rt 4.2.1
tokenizers 0.12.1 tomli 2.0.1 torch 1.12.1+cpu
torchvision 0.13.1+cpu tornado 6.1 tqdm 4.62.3
traitlets 5.1.0 transformers 4.21.2 typer 0.4.2
typing-extensions 3.10.0.2 ujson 4.0.2 unattended-upgrades 0.1
urllib3 1.26.7 virtualenv 20.8.0 visions 0.7.4
wasabi 0.10.1 wcwidth 0.2.5 webencodings 0.5.1
websocket-client 1.3.1 Werkzeug 2.0.2 wheel 0.37.0
widgetsnbextension 3.6.0 wrapt 1.12.1 xgboost 1.6.2
zipp 3.6.0

Python libraries on GPU clusters

Library Version Library Version Library Version
absl-py 1.0.0 argon2-cffi 20.1.0 astor 0.8.1
astunparse 1.6.3 async-generator 1.10 attrs 21.2.0
azure-core 1.22.1 azure-cosmos 4.2.0 backcall 0.2.0
backports.entry-points-selectable 1.1.1 bcrypt 4.0.0 black 22.3.0
bleach 4.0.0 blis 0.7.8 boto3 1.21.18
botocore 1.24.18 cachetools 5.2.0 catalogue 2.0.8
certifi 2021.10.8 cffi 1.14.6 chardet 4.0.0
charset-normalizer 2.0.4 click 8.0.3 cloudpickle 2.0.0
cmdstanpy 0.9.68 confection 0.0.1 configparser 5.2.0
convertdate 2.4.0 cryptography 3.4.8 cycler 0.10.0
cymem 2.0.6 Cython 0.29.24 databricks-automl-runtime 0.2.11
databricks-cli 0.17.3 dbl-tempo 0.1.12 dbus-python 1.2.16
debugpy 1.4.1 decorator 5.1.0 defusedxml 0.7.1
dill 0.3.4 diskcache 5.4.0 distlib 0.3.6
entrypoints 0.3 ephem 4.1.3 facets-overview 1.0.0
fasttext 0.9.2 filelock 3.3.1 Flask 1.1.2
flatbuffers 1.12 fsspec 2021.8.1 future 0.18.2
gast 0.4.0 gitdb 4.0.9 GitPython 3.1.27
google-auth 2.6.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0
grpcio 1.44.0 gunicorn 20.1.0 gviz-api 1.10.0
h5py 3.3.0 hijri-converter 2.2.4 holidays 0.15
horovod 0.25.0 htmlmin 0.1.12 huggingface-hub 0.9.1
idna 3.2 ImageHash 4.3.0 imbalanced-learn 0.8.1
importlib-metadata 4.8.1 ipykernel 6.12.1 ipython 7.32.0
ipython-genutils 0.2.0 ipywidgets 7.7.0 isodate 0.6.1
itsdangerous 2.0.1 jedi 0.18.0 Jinja2 2.11.3
jmespath 0.10.0 joblib 1.0.1 joblibspark 0.5.0
jsonschema 3.2.0 jupyter-client 6.1.12 jupyter-core 4.8.1
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.9.0
Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 korean-lunar-calendar 0.3.1
langcodes 3.3.0 libclang 14.0.6 lightgbm 3.3.2
llvmlite 0.37.0 LunarCalendar 0.0.9 Mako 1.2.0
Markdown 3.3.6 MarkupSafe 2.0.1 matplotlib 3.4.3
matplotlib-inline 0.1.2 missingno 0.5.1 mistune 0.8.4
mleap 0.20.0 mlflow-skinny 1.29.0 multimethod 1.9
murmurhash 1.0.8 mypy-extensions 0.4.3 nbclient 0.5.3
nbconvert 6.1.0 nbformat 5.1.3 nest-asyncio 1.5.1
networkx 2.6.3 nltk 3.6.5 notebook 6.4.5
numba 0.54.1 numpy 1.20.3 oauthlib 3.2.0
opt-einsum 3.3.0 packaging 21.0 pandas 1.3.4
pandas-profiling 3.1.0 pandocfilters 1.4.3 paramiko 2.9.2
parso 0.8.2 pathspec 0.9.0 pathy 0.6.2
patsy 0.5.2 petastorm 0.11.4 pexpect 4.8.0
phik 0.12.2 pickleshare 0.7.5 Pillow 8.4.0
pip 21.2.4 platformdirs 2.5.2 plotly 5.9.0
pmdarima 1.8.5 preshed 3.0.7 prompt-toolkit 3.0.20
prophet 1.0.1 protobuf 3.19.4 psutil 5.8.0
psycopg2 2.9.3 ptyprocess 0.7.0 pyarrow 7.0.0
pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.10.0
pycparser 2.20 pydantic 1.9.2 Pygments 2.10.0
PyGObject 3.36.0 PyJWT 2.5.0 PyMeeus 0.5.11
PyNaCl 1.5.0 pyodbc 4.0.31 pyparsing 3.0.4
pyrsistent 0.18.0 pystan 2.19.1.1 python-dateutil 2.8.2
python-editor 1.0.4 pytz 2021.3 PyWavelets 1.1.1
PyYAML 6.0 pyzmq 22.2.1 regex 2021.8.3
requests 2.26.0 requests-oauthlib 1.3.1 requests-unixsocket 0.2.0
rsa 4.9 s3transfer 0.5.2 scikit-learn 0.24.2
scipy 1.7.1 seaborn 0.11.3 Send2Trash 1.8.0
setuptools 58.0.4 setuptools-git 1.2 shap 0.41.0
simplejson 3.17.6 six 1.16.0 slicer 0.0.7
smart-open 5.2.1 smmap 5.0.0 spacy 3.4.1
spacy-legacy 3.0.10 spacy-loggers 1.0.3 spark-tensorflow-distributor 1.0.0
sqlparse 0.4.2 srsly 2.4.4 ssh-import-id 5.10
statsmodels 0.12.2 tabulate 0.8.9 tangled-up-in-unicode 0.1.0
tenacity 8.0.1 tensorboard 2.9.1 tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.8.0 tensorboard-plugin-wit 1.8.1 tensorflow 2.9.1
tensorflow-estimator 2.9.0 tensorflow-io-gcs-filesystem 0.27.0 termcolor 2.0.1
terminado 0.9.4 testpath 0.5.0 thinc 8.1.2
threadpoolctl 2.2.0 tokenize-rt 4.2.1 tokenizers 0.12.1
tomli 2.0.1 torch 1.12.1+cu113 torchvision 0.13.1+cu113
tornado 6.1 tqdm 4.62.3 traitlets 5.1.0
transformers 4.21.2 typer 0.4.2 typing-extensions 3.10.0.2
ujson 4.0.2 unattended-upgrades 0.1 urllib3 1.26.7
virtualenv 20.8.0 visions 0.7.4 wasabi 0.10.1
wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.3.1
Werkzeug 2.0.2 wheel 0.37.0 widgetsnbextension 3.6.0
wrapt 1.12.1 xgboost 1.6.2 zipp 3.6.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 11.3 LTS.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 11.3 LTS, Databricks Runtime 11.3 LTS ML contains the following JARs:

CPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 v0.20.0-db1
ml.dmlc xgboost4j-spark_2.12 1.6.2
ml.dmlc xgboost4j_2.12 1.6.2
org.graphframes graphframes_2.12 0.8.2-db1-spark3.2
org.mlflow mlflow-client 1.29.0
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

GPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 v0.20.0-db1
ml.dmlc xgboost4j-gpu_2.12 1.6.2
ml.dmlc xgboost4j-spark-gpu_2.12 1.6.2
org.graphframes graphframes_2.12 0.8.2-db1-spark3.2
org.mlflow mlflow-client 1.29.0
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0