Databricks Runtime 7.0 ML (EoS)

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

Databricks released this version in June 2020.

Databricks Runtime 7.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 7.0 (EoS). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. It also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see AI and machine learning on Databricks.

New features and major changes

Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0. For information on what's new in Databricks Runtime 7.0, including Apache Spark MLlib and SparkR, see the Databricks Runtime 7.0 (EoS) release notes.

GPU-aware scheduling

Databricks Runtime 7.0 ML supports GPU-aware scheduling from Apache Spark 3.0. Azure Databricks automatically configures it for you. See GPU scheduling.

Major changes to ML Python environment

This section describes the major changes to the pre-installed ML Python environment compared to Databricks Runtime 6.6 ML (EoS). You should also review the major changes to the base Python environment in Databricks Runtime 7.0 (EoS). For a full list of installed Python packages and their versions, see Python libraries.

Python packages upgraded

  • tensorflow 1.15.0 -> 2.2.0
  • tensorboard 1.15.0 -> 2.2.2
  • pytorch 1.4.0 -> 1.5.0
  • xgboost 0.90 -> 1.1.1
  • sparkdl 1.6.0-db1 -> 2.1.0-db1
  • hyperopt 0.2.2.db1 -> 0.2.4.db1

Python packages added

  • lightgbm: 2.3.0
  • nltk: 3.4.5
  • petastorm: 0.9.2
  • plotly: 4.5.2

Python packages removed

  • argparse
  • boto (use boto3 instead)
  • colorama
  • deprecated
  • et-xmlfile
  • fusepy
  • html5lib
  • jdcal
  • keras (use tensorflow.keras instead)
  • keras-applications (use tensorflow.keras.applications instead)
  • llvmlite
  • lxml
  • nose
  • nose-exclude
  • numba
  • openpyxl
  • pathlib2
  • ply
  • pymongo
  • singledispatch
  • tensorboardX (use torch.utils.tensorboard instead)
  • virtualenv
  • webencodings

Major changes to ML R environment

Databricks Runtime 7.0 ML includes an unmodified version of RStudio Server Open Source v1.2.5033 for which the source code can be found in GitHub. Read more about RStudio Server on Azure Databricks.

Changes to ML Spark packages, Java and Scala libraries

The following packages are upgraded. Some are upgraded to SNAPSHOT releases that are compatible with Apache Spark 3.0:

  • graphframes: 0.7.0-db1-spark2.4 -> 0.8.0-db2-spark3.0
  • spark-tensorflow-connector: 1.15.0 (Scala 2.11) -> 1.15.0 (Scala 2.12)
  • xgboost4j and xgboost4j-spark: 0.90 -> 1.0.0
  • mleap-databricks-runtime: 0.17.0-4882dc3 (SNAPSHOT)

The following packages are removed:

  • TensorFlow (Java)
  • TensorFrames
  • Deep Learning Pipelines for Apache Spark (HorovodRunner is available in Python)

Added conda and pip commands to support notebook-scoped Python libraries (public preview)

Starting with Databricks Runtime 7.0 ML, you can use %pip and %conda commands to manage Python libraries installed in a notebook session. You can also use these commands to create a custom environment for a notebook and to reproduce this environment between notebooks. To enable this feature, in cluster settings, set the Spark configuration spark.databricks.conda.condaMagic.enabled true. For more information, see Notebook-scoped Python libraries.

Deprecations and unsupported features

Databricks Runtime 7.0 ML does not support table access control. If you need table access control, we recommend that you use Databricks Runtime 7.0.

Known issues

  • If you're logging an MLlib model in mleap format, when the sample_input argument is passed to mlflow.spark.log_model it fails with an AttributeError. This issue is caused by an API change to mleap. To work around this issue, upgrade to MLflow 1.9.0. You can install MLflow 1.9.0 using Notebook-scoped Python libraries.

System environment

The system environment in Databricks Runtime 7.0 ML differs from Databricks Runtime 7.0 as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 7.0 ML that differ from those included in Databricks Runtime 7.0.

In this section:

Top-tier libraries

Databricks Runtime 7.0 ML includes the following top-tier libraries:

Python libraries

Databricks Runtime 7.0 ML uses Conda for Python package management and includes many popular ML packages. The following section describes the Conda environment for Databricks Runtime 7.0 ML.

Python on CPU clusters

name: databricks-ml
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.7=py37h7b6447c_1
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - boto3=1.12.0=py_0
  - botocore=1.15.0=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.1.1=0
  - cachetools=4.1.0=py_1
  - certifi=2020.4.5.1=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.0=py37_0
  - cloudpickle=1.3.0=py_0
  - configparser=3.7.4=py37_0
  - cpuonly=1.0=0
  - cryptography=2.8=py37h1ba5d50_0
  - cycler=0.10.0=py37_0
  - cython=0.29.15=py37he6710b0_0
  - decorator=4.4.1=py_0
  - dill=0.3.1.1=py37_1
  - docutils=0.15.2=py37_0
  - entrypoints=0.3=py37_0
  - flask=1.1.1=py_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.18.2=py37_1
  - gast=0.3.3=py_0
  - gitdb2=2.0.6=py_0
  - gitpython=3.0.5=py_0
  - google-auth=1.11.2=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gunicorn=20.0.4=py37_0
  - h5py=2.10.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=he6710b0_3
  - idna=2.8=py37_0
  - intel-openmp=2020.0=166
  - ipykernel=5.1.4=py37h39e3cac_0
  - ipython=7.12.0=py37h5ca1d4c_0
  - ipython_genutils=0.2.0=py37_0
  - itsdangerous=1.1.0=py37_0
  - jedi=0.14.1=py37_0
  - jinja2=2.11.1=py_0
  - jmespath=0.9.4=py_0
  - joblib=0.14.1=py_0
  - jpeg=9b=h024ee3a_2
  - jupyter_client=5.3.4=py37_0
  - jupyter_core=4.6.1=py37_0
  - kiwisolver=1.1.0=py37he6710b0_0
  - krb5=1.16.4=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.11.4=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.1.0=h2733197_0
  - lightgbm=2.3.0=py37he6710b0_0
  - lz4-c=1.8.1.2=h14c3975_0
  - mako=1.1.2=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_1
  - networkx=2.4=py_0
  - ninja=1.9.0=py37hfd86e86_0
  - nltk=3.4.5=py37_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - olefile=0.46=py37_0
  - openssl=1.1.1g=h7b6447c_0
  - packaging=20.1=py_0
  - pandas=1.0.1=py37h0573a6f_0
  - paramiko=2.7.1=py_0
  - parso=0.5.2=py_0
  - patsy=0.5.1=py37_0
  - pexpect=4.8.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=7.0.0=py37hb39fc2d_0
  - pip=20.0.2=py37_3
  - plotly=4.5.2=py_0
  - prompt_toolkit=3.0.3=py_0
  - protobuf=3.11.4=py37he6710b0_0
  - psutil=5.6.7=py37h7b6447c_0
  - psycopg2=2.8.4=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.19=py37_0
  - pygments=2.5.2=py_0
  - pyjwt=1.7.1=py37_0
  - pynacl=1.3.0=py37h7b6447c_0
  - pyodbc=4.0.30=py37he6710b0_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pysocks=1.7.1=py37_0
  - python=3.7.6=h0371630_2
  - python-dateutil=2.8.1=py_0
  - python-editor=1.0.4=py_0
  - pytorch=1.5.0=py3.7_cpu_0
  - pytz=2019.3=py_0
  - pyzmq=18.1.1=py37he6710b0_0
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_1
  - requests-oauthlib=1.3.0=py_0
  - retrying=1.3.3=py37_2
  - rsa=4.0=py_0
  - s3transfer=0.3.3=py37_0
  - scikit-learn=0.22.1=py37hd81dba3_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=45.2.0=py37_0
  - simplejson=3.17.0=py37h7b6447c_0
  - six=1.14.0=py37_0
  - smmap2=2.0.5=py37_0
  - sqlite=3.31.1=h62c20be_1
  - sqlparse=0.3.0=py_0
  - statsmodels=0.11.0=py37h7b6447c_0
  - tabulate=0.8.3=py37_0
  - tk=8.6.8=hbc83047_0
  - torchvision=0.6.0=py37_cpu
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.42.1=py_0
  - traitlets=4.3.3=py37_0
  - unixodbc=2.3.7=h14c3975_0
  - urllib3=1.25.8=py37_0
  - wcwidth=0.1.8=py_0
  - websocket-client=0.56.0=py37_0
  - werkzeug=1.0.0=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - zeromq=4.3.1=he6710b0_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - astunparse==1.6.3
    - databricks-cli==0.11.0
    - diskcache==4.1.0
    - docker==4.2.1
    - gorilla==0.3.0
    - horovod==0.19.1
    - hyperopt==0.2.4.db1
    - keras-preprocessing==1.1.2
    - mleap==0.16.0
    - mlflow==1.8.0
    - opt-einsum==3.2.1
    - petastorm==0.9.2
    - pyarrow==0.15.1
    - pyyaml==5.3.1
    - querystring-parser==1.2.4
    - seaborn==0.10.0
    - sparkdl==2.1.0-db1
    - tensorboard==2.2.2
    - tensorboard-plugin-wit==1.6.0.post3
    - tensorflow-cpu==2.2.0
    - tensorflow-estimator==2.2.0
    - termcolor==1.1.0
    - xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml

Python on GPU clusters

name: databricks-ml-gpu
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - backports=1.0=py_2
  - bcrypt=3.1.7=py37h7b6447c_1
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - boto3=1.12.0=py_0
  - botocore=1.15.0=py_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.1.1=0
  - cachetools=4.1.0=py_1
  - certifi=2020.4.5.2=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.0=py37_0
  - cloudpickle=1.3.0=py_0
  - configparser=3.7.4=py37_0
  - cryptography=2.8=py37h1ba5d50_0
  - cudatoolkit=10.1.243=h6bb024c_0
  - cycler=0.10.0=py37_0
  - cython=0.29.15=py37he6710b0_0
  - decorator=4.4.1=py_0
  - dill=0.3.1.1=py37_1
  - docutils=0.15.2=py37_0
  - entrypoints=0.3=py37_0
  - flask=1.1.1=py_1
  - freetype=2.9.1=h8a8886c_1
  - future=0.18.2=py37_1
  - gast=0.3.3=py_0
  - gitdb2=2.0.6=py_0
  - gitpython=3.0.5=py_0
  - google-auth=1.11.2=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gunicorn=20.0.4=py37_0
  - h5py=2.10.0=py37h7918eee_0
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=he6710b0_3
  - idna=2.8=py37_0
  - intel-openmp=2020.0=166
  - ipykernel=5.1.4=py37h39e3cac_0
  - ipython=7.12.0=py37h5ca1d4c_0
  - ipython_genutils=0.2.0=py37_0
  - itsdangerous=1.1.0=py37_0
  - jedi=0.14.1=py37_0
  - jinja2=2.11.1=py_0
  - jmespath=0.9.4=py_0
  - joblib=0.14.1=py_0
  - jpeg=9b=h024ee3a_2
  - jupyter_client=5.3.4=py37_0
  - jupyter_core=4.6.1=py37_0
  - kiwisolver=1.1.0=py37he6710b0_0
  - krb5=1.16.4=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libprotobuf=3.11.4=hd408876_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.1.0=h2733197_0
  - lightgbm=2.3.0=py37he6710b0_0
  - lz4-c=1.8.1.2=h14c3975_0
  - mako=1.1.2=py_0
  - markdown=3.1.1=py37_0
  - markupsafe=1.1.1=py37h7b6447c_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_1
  - networkx=2.4=py_0
  - ninja=1.9.0=py37hfd86e86_0
  - nltk=3.4.5=py37_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - olefile=0.46=py37_0
  - openssl=1.1.1g=h7b6447c_0
  - packaging=20.1=py_0
  - pandas=1.0.1=py37h0573a6f_0
  - paramiko=2.7.1=py_0
  - parso=0.5.2=py_0
  - patsy=0.5.1=py37_0
  - pexpect=4.8.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=7.0.0=py37hb39fc2d_0
  - pip=20.0.2=py37_3
  - plotly=4.5.2=py_0
  - prompt_toolkit=3.0.3=py_0
  - protobuf=3.11.4=py37he6710b0_0
  - psutil=5.6.7=py37h7b6447c_0
  - psycopg2=2.8.4=py37h1ba5d50_0
  - ptyprocess=0.6.0=py37_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.19=py37_0
  - pygments=2.5.2=py_0
  - pyjwt=1.7.1=py37_0
  - pynacl=1.3.0=py37h7b6447c_0
  - pyodbc=4.0.30=py37he6710b0_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pysocks=1.7.1=py37_0
  - python=3.7.6=h0371630_2
  - python-dateutil=2.8.1=py_0
  - python-editor=1.0.4=py_0
  - pytorch=1.5.0=py3.7_cuda10.1.243_cudnn7.6.3_0
  - pytz=2019.3=py_0
  - pyzmq=18.1.1=py37he6710b0_0
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_1
  - requests-oauthlib=1.3.0=py_0
  - retrying=1.3.3=py37_2
  - rsa=4.0=py_0
  - s3transfer=0.3.3=py37_0
  - scikit-learn=0.22.1=py37hd81dba3_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=45.2.0=py37_0
  - simplejson=3.17.0=py37h7b6447c_0
  - six=1.14.0=py37_0
  - smmap2=2.0.5=py37_0
  - sqlite=3.31.1=h62c20be_1
  - sqlparse=0.3.0=py_0
  - statsmodels=0.11.0=py37h7b6447c_0
  - tabulate=0.8.3=py37_0
  - tk=8.6.8=hbc83047_0
  - torchvision=0.6.0=py37_cu101
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.42.1=py_0
  - traitlets=4.3.3=py37_0
  - unixodbc=2.3.7=h14c3975_0
  - urllib3=1.25.8=py37_0
  - wcwidth=0.1.8=py_0
  - websocket-client=0.56.0=py37_0
  - werkzeug=1.0.0=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - zeromq=4.3.1=he6710b0_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - astunparse==1.6.3
    - databricks-cli==0.11.0
    - diskcache==4.1.0
    - docker==4.2.1
    - gorilla==0.3.0
    - horovod==0.19.1
    - hyperopt==0.2.4.db1
    - keras-preprocessing==1.1.2
    - mleap==0.16.0
    - mlflow==1.8.0
    - opt-einsum==3.2.1
    - petastorm==0.9.2
    - pyarrow==0.15.1
    - pyyaml==5.3.1
    - querystring-parser==1.2.4
    - seaborn==0.10.0
    - sparkdl==2.1.0-db1
    - tensorboard==2.2.2
    - tensorboard-plugin-wit==1.6.0.post3
    - tensorflow-estimator==2.2.0
    - tensorflow-gpu==2.2.0
    - termcolor==1.1.0
    - xgboost==1.1.1
prefix: /databricks/conda/envs/databricks-ml-gpu

Spark packages containing Python modules

Spark Package Python Module Version
graphframes graphframes 0.8.0-db2-spark3.0

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 7.0 Beta.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 7.0, Databricks Runtime 7.0 ML contains the following JARs:

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 0.17.0-4882dc3
ml.dmlc xgboost4j-spark_2.12 1.0.0
ml.dmlc xgboost4j_2.12 1.0.0
org.mlflow mlflow-client 1.8.0
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0