Databricks Runtime 5.2 ML(Beta 版本)Databricks Runtime 5.2 ML (Beta)

Databricks 于 2019 年 1 月发布了此映像。Databricks released this image in January 2019.

Databricks Runtime 5.2 ML 基于 Databricks Runtime 5.2(不受支持),为机器学习和数据科学提供了随时可用的环境。Databricks Runtime 5.2 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.2 (Unsupported). 用于 ML 的 Databricks Runtime 包含许多常用的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtime for ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 进行分布式 TensorFlow 训练。It also supports distributed TensorFlow training using Horovod.

有关详细信息,包括有关创建 Databricks Runtime ML 群集的说明,请参阅用于机器学习的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

新增功能New features

Databricks Runtime 5.2 ML 是基于 Databricks Runtime 5.2 构建的。Databricks Runtime 5.2 ML is built on top of Databricks Runtime 5.2. 若要了解 Databricks Runtime 5.2 中的新增功能,请参阅 Databricks Runtime 5.2(不受支持)发行说明。For information on what’s new in Databricks Runtime 5.2, see the Databricks Runtime 5.2 (Unsupported) release notes. 除了库更新,Databricks Runtime 5.2 ML 还引入了以下新功能:In addition to library updates, Databricks Runtime 5.2 ML introduces the following new features:

  • GraphFrames 现在通过 Databricks 的性能优化支持 Pregel API (Python)。GraphFrames now supports the Pregel API (Python) with Databricks’s performance optimizations.
  • HorovodRunner 添加了以下功能:HorovodRunner adds:
    • 在 GPU 群集上,训练过程映射到 GPU 而不是工作器节点,以简化对多 GPU 实例类型的支持。On a GPU cluster, training processes are mapped to GPUs instead of worker nodes to simplify the support of multi-GPU instance types. 利用此内置支持,你可以在无需自定义代码的情况下分发到多 GPU 计算机上的所有 GPU。This built-in support allows you to distribute to all of the GPUs on a multi-GPU machine without custom code.
    • HorovodRunner.run() 现在返回第一个训练过程的返回值。HorovodRunner.run() now returns the return value from the first training process.

备注

Databricks Runtime ML 版本会获取基础 Databricks Runtime 版本的所有维护更新。Databricks Runtime ML releases pick up all maintenance updates to the base Databricks Runtime release. 有关所有维护更新的列表,请参阅 Databricks 运行时维护更新For a list of all maintenance updates, see Databricks runtime maintenance updates.

系统环境System environment

Databricks Runtime 5.2 ML 中的系统环境在以下方面不同于 Databricks Runtime 5.2:The system environment in Databricks Runtime 5.2 ML differs from Databricks Runtime 5.2 as follows:

  • Python:2.7.15 适用于 Python 2 群集,3.6.5 适用于 Python 3 群集。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • DBUtils:Databricks Runtime 5.2 ML 未包含库实用工具DBUtils: Databricks Runtime 5.2 ML does not contain Library utilities.
  • 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla 驱动程序 396.44Tesla driver 396.44
    • CUDA 9.2CUDA 9.2
    • CUDNN 7.2.1CUDNN 7.2.1

Libraries

以下部分列出了 Databricks Runtime 5.2 ML 中包含的库,这些库不同于 Databricks Runtime 5.2 中包含的库。The following sections list the libraries included in Databricks Runtime 5.2 ML that differ from those included in Databricks Runtime 5.2.

Python 库Python libraries

Databricks Runtime 5.2 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.2 ML uses Conda for Python package management. 因此,预安装的 Python 库相对于 Databricks Runtime 有很大区别。As a result, there are major differences in pre-installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。The following is a full list of provided Python packages and versions installed using Conda package manager.

Library 版本Version Library 版本Version Library 版本Version
absl-pyabsl-py 0.6.10.6.1 argparseargparse 1.4.01.4.0 asn1cryptoasn1crypto 0.24.00.24.0
astorastor 0.7.10.7.1 backports-abcbackports-abc 0.50.5 backports.functools-lru-cachebackports.functools-lru-cache 1.51.5
backports.weakrefbackports.weakref 1.0.post11.0.post1 bcryptbcrypt 3.1.53.1.5 bleachbleach 2.1.32.1.3
botoboto 2.48.02.48.0 boto3boto3 1.7.621.7.62 botocorebotocore 1.10.621.10.62
certificertifi 2018.04.162018.04.16 cfficffi 1.11.51.11.5 chardetchardet 3.0.43.0.4
cloudpicklecloudpickle 0.5.30.5.3 coloramacolorama 0.3.90.3.9 configparserconfigparser 3.5.03.5.0
密码系统cryptography 2.2.22.2.2 cyclercycler 0.10.00.10.0 CythonCython 0.28.20.28.2
decoratordecorator 4.3.04.3.0 docutilsdocutils 0.140.14 entrypointsentrypoints 0.2.30.2.3
enum34enum34 1.1.61.1.6 et-xmlfileet-xmlfile 1.0.11.0.1 funcsigsfuncsigs 1.0.21.0.2
functools32functools32 3.2.3-23.2.3-2 fusepyfusepy 2.0.42.0.4 Futurefutures 3.2.03.2.0
gastgast 0.2.00.2.0 grpciogrpcio 1.12.11.12.1 h5pyh5py 2.8.02.8.0
horovodhorovod 0.15.20.15.2 html5libhtml5lib 1.0.11.0.1 idnaidna 2.62.6
ipaddressipaddress 1.0.221.0.22 ipythonipython 5.7.05.7.0 ipython_genutilsipython_genutils 0.2.00.2.0
jdcaljdcal 1.41.4 Jinja2Jinja2 2.102.10 jmespathjmespath 0.9.30.9.3
jsonschemajsonschema 2.6.02.6.0 jupyter-clientjupyter-client 5.2.35.2.3 jupyter-corejupyter-core 4.4.04.4.0
KerasKeras 2.2.42.2.4 Keras-ApplicationsKeras-Applications 1.0.61.0.6 Keras-PreprocessingKeras-Preprocessing 1.0.51.0.5
kiwisolverkiwisolver 1.0.11.0.1 linecache2linecache2 1.0.01.0.0 llvmlitellvmlite 0.23.10.23.1
lxmllxml 4.2.14.2.1 MarkdownMarkdown 3.0.13.0.1 MarkupSafeMarkupSafe 1.01.0
matplotlibmatplotlib 2.2.22.2.2 mistunemistune 0.8.30.8.3 mleapmleap 0.8.10.8.1
mockmock 2.0.02.0.0 msgpackmsgpack 0.5.60.5.6 nbconvertnbconvert 5.3.15.3.1
nbformatnbformat 4.4.04.4.0 nosenose 1.3.71.3.7 nose-excludenose-exclude 0.5.00.5.0
numbanumba 0.38.0+0.g2a2b772fc.dirty0.38.0+0.g2a2b772fc.dirty numpynumpy 1.14.31.14.3 olefileolefile 0.45.10.45.1
openpyxlopenpyxl 2.5.32.5.3 pandaspandas 0.23.00.23.0 pandocfilterspandocfilters 1.4.21.4.2
paramikoparamiko 2.4.12.4.1 pathlib2pathlib2 2.3.22.3.2 patsypatsy 0.5.00.5.0
pbrpbr 5.1.15.1.1 pexpectpexpect 4.5.04.5.0 picklesharepickleshare 0.7.40.7.4
PillowPillow 5.1.05.1.0 pippip 10.0.110.0.1 plyply 3.113.11
prompt-toolkitprompt-toolkit 1.0.151.0.15 protobufprotobuf 3.6.13.6.1 psycopg2psycopg2 2.7.52.7.5
ptyprocessptyprocess 0.5.20.5.2 pyarrowpyarrow 0.8.00.8.0 pyasn1pyasn1 0.4.40.4.4
pycparserpycparser 2.182.18 PygmentsPygments 2.2.02.2.0 PyNaClPyNaCl 1.3.01.3.0
pyOpenSSLpyOpenSSL 18.0.018.0.0 pyparsingpyparsing 2.2.02.2.0 PySocksPySocks 1.6.81.6.8
PythonPython 2.7.152.7.15 python-dateutilpython-dateutil 2.7.32.7.3 pytzpytz 2018.42018.4
PyYAMLPyYAML 3.123.12 pyzmqpyzmq 17.0.017.0.0 请求requests 2.18.42.18.4
s3transfers3transfer 0.1.130.1.13 scandirscandir 1.71.7 scikit-learnscikit-learn 0.19.10.19.1
scipyscipy 1.1.01.1.0 seabornseaborn 0.8.10.8.1 setuptoolssetuptools 39.1.039.1.0
simplegenericsimplegeneric 0.8.10.8.1 singledispatchsingledispatch 3.4.0.33.4.0.3 6six 1.11.01.11.0
statsmodelsstatsmodels 0.9.00.9.0 subprocess32subprocess32 3.5.33.5.3 tensorboardtensorboard 1.12.21.12.2
tensorboardXtensorboardX 1.41.4 tensorflowtensorflow 1.12.01.12.0 termcolortermcolor 1.1.01.1.0
testpathtestpath 0.3.10.3.1 torchtorch 0.4.10.4.1 torchvisiontorchvision 0.2.10.2.1
tornadotornado 5.0.25.0.2 traceback2traceback2 1.4.01.4.0 traitletstraitlets 4.3.24.3.2
unittest2unittest2 1.1.01.1.0 urllib3urllib3 1.221.22 virtualenvvirtualenv 16.0.016.0.0
wcwidthwcwidth 0.1.70.1.7 webencodingswebencodings 0.5.10.5.1 WerkzeugWerkzeug 0.14.10.14.1
wheelwheel 0.31.10.31.1 wraptwrapt 1.10.111.10.11 wsgirefwsgiref 0.1.20.1.2

此外,以下 Spark 包还包括 Python 模块:In addition, the following Spark packages include Python modules:

Spark 包Spark Package Python 模块Python Module 版本Version
graphframesgraphframes graphframesgraphframes 0.7.0-db1-spark2.40.7.0-db1-spark2.4
spark-deep-learningspark-deep-learning sparkdlsparkdl 1.5.0-db1-spark2.41.5.0-db1-spark2.4
tensorframestensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11

R 库R libraries

R 库与 Databricks Runtime 5.2 中的 R 库完全相同。The R libraries are identical to the R Libraries in Databricks Runtime 5.2.

Java 库和 Scala 库(Scala 2.11 群集)Java and Scala libraries (Scala 2.11 cluster)

除了 Databricks Runtime 5.2 中的 Java 库和 Scala 库之外,Databricks Runtime 5.2 ML 还包含以下 JAR:In addition to Java and Scala libraries in Databricks Runtime 5.2, Databricks Runtime 5.2 ML contains the following JARs:

组 IDGroup ID 项目 IDArtifact ID 版本Version
com.databrickscom.databricks spark-deep-learningspark-deep-learning 1.5.0-db1-spark2.41.5.0-db1-spark2.4
com.typesafe.akkacom.typesafe.akka akka-actor_2.11akka-actor_2.11 2.3.112.3.11
ml.combust.mleapml.combust.mleap mleap-databricks-runtime_2.11mleap-databricks-runtime_2.11 0.13.00.13.0
ml.dmlcml.dmlc xgboost4jxgboost4j 0.810.81
ml.dmlcml.dmlc xgboost4j-sparkxgboost4j-spark 0.810.81
org.graphframesorg.graphframes graphframes_2.11graphframes_2.11 0.7.0-db1-spark2.40.7.0-db1-spark2.4
org.tensorfloworg.tensorflow libtensorflowlibtensorflow 1.12.01.12.0
org.tensorfloworg.tensorflow libtensorflow_jnilibtensorflow_jni 1.12.01.12.0
org.tensorfloworg.tensorflow spark-tensorflow-connector_2.11spark-tensorflow-connector_2.11 1.12.01.12.0
org.tensorfloworg.tensorflow tensorflowtensorflow 1.12.01.12.0
org.tensorframesorg.tensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11