Databricks Runtime 5.1 ML(Beta 版本)Databricks Runtime 5.1 ML (Beta)

Databricks 于 2018 年 12 月发布了此映像。Databricks released this image in December 2018.

Databricks Runtime 5.1 ML 基于 Databricks Runtime 5.1(不受支持),为机器学习和数据科学提供了随时可用的环境。Databricks Runtime 5.1 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.1 (Unsupported). 用于 ML 的 Databricks Runtime 包含许多常用的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtimes for ML contain many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 进行分布式 TensorFlow 训练。It also supports distributed TensorFlow training using Horovod.

有关详细信息,包括有关创建 Databricks Runtime ML 群集的说明,请参阅用于机器学习的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

新增功能New features

Databricks Runtime 5.1 ML 是基于 Databricks Runtime 5.1 构建的。Databricks Runtime 5.1 ML is built on top of Databricks Runtime 5.1. 若要了解 Databricks Runtime 5.1 中的新增功能,请参阅 Databricks Runtime 5.1(不受支持)发行说明。For information on what’s new in Databricks Runtime 5.1, see the Databricks Runtime 5.1 (Unsupported) release notes. 中现有库的更新外,Databricks Runtime 5.1 ML 还包含以下新功能:In addition to the updates to existing libraries in Libraries, Databricks Runtime 5.1 ML includes the following new features:

  • 用于构建深度学习网络的 PyTorchPyTorch for building deep learning networks.

备注

Databricks Runtime ML 版本会获取基础 Databricks Runtime 版本的所有维护更新。Databricks Runtime ML releases pick up all maintenance updates to the base Databricks Runtime release. 有关所有维护更新的列表,请参阅 Databricks 运行时维护更新For a list of all maintenance updates, see Databricks runtime maintenance updates.

系统环境System environment

Databricks Runtime 5.1 与 Databricks Runtime 5.1 ML 中系统环境的不同在于:The difference in system environment in Databricks Runtime 5.1 and that in Databricks Runtime 5.1 ML is:

  • Python:2.7.15 适用于 Python 2 群集,3.6.5 适用于 Python 3 群集。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • DBUtils:Databricks Runtime 5.1 ML 未包含库实用工具DBUtils: Databricks Runtime 5.1 ML does not contain Library utilities.
  • 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla 驱动程序 396.44Tesla driver 396.44
    • CUDA 9.2CUDA 9.2
    • CUDNN 7.2.1CUDNN 7.2.1

Libraries

本部分列出了 Databricks Runtime 5.1 包含的库和 Databricks Runtime 5.1 ML 包含的库之间的不同之处。The differences in the libraries included in Databricks Runtime 5.1 and those included in Databricks Runtime 5.1 ML are listed in this section.

Python 库Python libraries

Databricks Runtime 5.1 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.1 ML uses Conda for Python package management. 因此,预安装的 Python 库相对于 Databricks Runtime 有很大更改。As a result, there are major changes in pre-installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。Following is the full list of provided Python packages and versions installed using Conda package manager.

Library 版本Version Library 版本Version Library 版本Version
absl-pyabsl-py 0.6.10.6.1 argparseargparse 1.4.01.4.0 asn1cryptoasn1crypto 0.24.00.24.0
astorastor 0.7.10.7.1 backports-abcbackports-abc 0.50.5 backports.functools-lru-cachebackports.functools-lru-cache 1.51.5
backports.weakrefbackports.weakref 1.0.post11.0.post1 bcryptbcrypt 3.1.43.1.4 bleachbleach 2.1.32.1.3
botoboto 2.48.02.48.0 boto3boto3 1.7.621.7.62 botocorebotocore 1.10.621.10.62
certificertifi 2018.04.162018.04.16 cfficffi 1.11.51.11.5 chardetchardet 3.0.43.0.4
cloudpicklecloudpickle 0.5.30.5.3 coloramacolorama 0.3.90.3.9 configparserconfigparser 3.5.03.5.0
密码系统cryptography 2.2.22.2.2 cyclercycler 0.10.00.10.0 CythonCython 0.28.20.28.2
decoratordecorator 4.3.04.3.0 docutilsdocutils 0.140.14 entrypointsentrypoints 0.2.30.2.3
enum34enum34 1.1.61.1.6 et-xmlfileet-xmlfile 1.0.11.0.1 funcsigsfuncsigs 1.0.21.0.2
functools32functools32 3.2.3-23.2.3-2 fusepyfusepy 2.0.42.0.4 Futurefutures 3.2.03.2.0
gastgast 0.2.00.2.0 grpciogrpcio 1.12.11.12.1 h5pyh5py 2.8.02.8.0
horovodhorovod 0.15.00.15.0 html5libhtml5lib 1.0.11.0.1 idnaidna 2.62.6
ipaddressipaddress 1.0.221.0.22 ipythonipython 5.7.05.7.0 ipython_genutilsipython_genutils 0.2.00.2.0
jdcaljdcal 1.41.4 Jinja2Jinja2 2.102.10 jmespathjmespath 0.9.30.9.3
jsonschemajsonschema 2.6.02.6.0 jupyter-clientjupyter-client 5.2.35.2.3 jupyter-corejupyter-core 4.4.04.4.0
KerasKeras 2.2.42.2.4 Keras-ApplicationsKeras-Applications 1.0.61.0.6 Keras-PreprocessingKeras-Preprocessing 1.0.51.0.5
kiwisolverkiwisolver 1.0.11.0.1 linecache2linecache2 1.0.01.0.0 llvmlitellvmlite 0.23.10.23.1
lxmllxml 4.2.14.2.1 MarkdownMarkdown 3.0.13.0.1 MarkupSafeMarkupSafe 1.01.0
matplotlibmatplotlib 2.2.22.2.2 mistunemistune 0.8.30.8.3 mleapmleap 0.8.10.8.1
mockmock 2.0.02.0.0 msgpackmsgpack 0.5.60.5.6 nbconvertnbconvert 5.3.15.3.1
nbformatnbformat 4.4.04.4.0 nosenose 1.3.71.3.7 nose-excludenose-exclude 0.5.00.5.0
numbanumba 0.38.0+0.g2a2b772fc.dirty0.38.0+0.g2a2b772fc.dirty numpynumpy 1.14.31.14.3 olefileolefile 0.45.10.45.1
openpyxlopenpyxl 2.5.32.5.3 pandaspandas 0.23.00.23.0 pandocfilterspandocfilters 1.4.21.4.2
paramikoparamiko 2.4.12.4.1 pathlib2pathlib2 2.3.22.3.2 patsypatsy 0.5.00.5.0
pbrpbr 5.1.15.1.1 pexpectpexpect 4.5.04.5.0 picklesharepickleshare 0.7.40.7.4
PillowPillow 5.1.05.1.0 pippip 10.0.110.0.1 plyply 3.113.11
prompt-toolkitprompt-toolkit 1.0.151.0.15 protobufprotobuf 3.6.13.6.1 psycopg2psycopg2 2.7.52.7.5
ptyprocessptyprocess 0.5.20.5.2 pyarrowpyarrow 0.8.00.8.0 pyasn1pyasn1 0.4.40.4.4
pycparserpycparser 2.182.18 PygmentsPygments 2.2.02.2.0 PyNaClPyNaCl 1.3.01.3.0
pyOpenSSLpyOpenSSL 18.0.018.0.0 pyparsingpyparsing 2.2.02.2.0 PySocksPySocks 1.6.81.6.8
PythonPython 2.7.152.7.15 python-dateutilpython-dateutil 2.7.32.7.3 pytzpytz 2018.42018.4
PyYAMLPyYAML 3.123.12 pyzmqpyzmq 17.0.017.0.0 请求requests 2.18.42.18.4
s3transfers3transfer 0.1.130.1.13 scandirscandir 1.71.7 scikit-learnscikit-learn 0.19.10.19.1
scipyscipy 1.1.01.1.0 seabornseaborn 0.8.10.8.1 setuptoolssetuptools 39.1.039.1.0
simplegenericsimplegeneric 0.8.10.8.1 singledispatchsingledispatch 3.4.0.33.4.0.3 6six 1.11.01.11.0
statsmodelsstatsmodels 0.9.00.9.0 subprocess32subprocess32 3.5.33.5.3 tensorboardtensorboard 1.12.01.12.0
tensorboardXtensorboardX 1.41.4 tensorflowtensorflow 1.12.01.12.0 termcolortermcolor 1.1.01.1.0
testpathtestpath 0.3.10.3.1 torchtorch 0.4.10.4.1 torchvisiontorchvision 0.2.10.2.1
tornadotornado 5.0.25.0.2 traceback2traceback2 1.4.01.4.0 traitletstraitlets 4.3.24.3.2
unittest2unittest2 1.1.01.1.0 urllib3urllib3 1.221.22 virtualenvvirtualenv 16.0.016.0.0
wcwidthwcwidth 0.1.70.1.7 webencodingswebencodings 0.5.10.5.1 WerkzeugWerkzeug 0.14.10.14.1
wheelwheel 0.31.10.31.1 wraptwrapt 1.10.111.10.11 wsgirefwsgiref 0.1.20.1.2

此外,以下 Spark 包还包括 Python 模块:In addition, the following Spark packages include Python modules:

Spark 包Spark Package Python 模块Python Module 版本Version
tensorframestensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11
graphframesgraphframes graphframesgraphframes 0.6.0-db3-spark2.40.6.0-db3-spark2.4
spark-deep-learningspark-deep-learning sparkdlsparkdl 1.4.0-db2-spark2.41.4.0-db2-spark2.4

R 库R libraries

R 库与 Databricks Runtime 5.1 中的 R 库完全相同。The R libraries are identical to R Libraries on Databricks Runtime 5.1.

Java 库和 Scala 库(Scala 2.11 群集)Java and Scala libraries (Scala 2.11 cluster)

除了 Databricks Runtime 5.1 中的 Java 库和 Scala 库之外,Databricks Runtime 5.1 ML 还包含以下 JAR:In addition to Java and Scala libraries in Databricks Runtime 5.1, Databricks Runtime 5.1 ML contains the following JARs:

组 IDGroup ID 项目 IDArtifact ID 版本Version
com.databrickscom.databricks spark-deep-learningspark-deep-learning 1.4.0-db2-spark2.41.4.0-db2-spark2.4
org.tensorframesorg.tensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11
org.graphframesorg.graphframes graphframes_2.11graphframes_2.11 0.6.0-db3-spark2.40.6.0-db3-spark2.4
org.tensorfloworg.tensorflow libtensorflowlibtensorflow 1.12.01.12.0
org.tensorfloworg.tensorflow libtensorflow_jnilibtensorflow_jni 1.12.01.12.0
org.tensorfloworg.tensorflow spark-tensorflow-connector_2.11spark-tensorflow-connector_2.11 1.12.01.12.0
org.tensorfloworg.tensorflow tensorflowtensorflow 1.12.01.12.0
ml.dmlcml.dmlc xgboost4jxgboost4j 0.810.81
ml.dmlcml.dmlc xgboost4j-sparkxgboost4j-spark 0.810.81
ml.combust.mleapml.combust.mleap mleap-databricks-runtime_2.11mleap-databricks-runtime_2.11 0.13.00.13.0