Databricks Runtime 5.4 ML(不受支持)Databricks Runtime 5.4 ML (Unsupported)

Databricks 于 2019 年 6 月发布了此映像。Databricks released this image in June 2019.

Databricks Runtime 5.4 ML 基于 Databricks Runtime 5.4(不受支持),为机器学习和数据科学提供了随时可用的环境。Databricks Runtime 5.4 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.4 (Unsupported). 用于 ML 的 Databricks Runtime 包含许多常用的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtime for ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 进行分布式深度学习训练。It also supports distributed deep learning training using Horovod.

有关详细信息,包括有关创建 Databricks Runtime ML 群集的说明,请参阅用于机器学习的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

新增功能New features

Databricks Runtime 5.4 ML 是基于 Databricks Runtime 5.4 构建的。Databricks Runtime 5.4 ML is built on top of Databricks Runtime 5.4. 若要了解 Databricks Runtime 5.4 中的新增功能,请参阅 Databricks Runtime 5.4(不受支持)发行说明。For information on what’s new in Databricks Runtime 5.4, see the Databricks Runtime 5.4 (Unsupported) release notes.

除了库更新,Databricks Runtime 5.4 ML 还引入了以下新功能:In addition to library updates, Databricks Runtime 5.4 ML introduces the following new features:

分布式 Hyperopt + 自动化 MLflow 跟踪Distributed Hyperopt + automated MLflow tracking

Databricks Runtime 5.4 ML 引入了由 Apache Spark 提供支持的 Hyperopt 的新实现,用于缩放和简化超参数优化。Databricks Runtime 5.4 ML introduces a new implementation of Hyperopt powered by Apache Spark to scale and simplify hyperparameter tuning. 实现新的 TrialsSparkTrials 是为了使用 Apache Spark 在多个计算机和节点之间分发 Hyperopt 试用版运行。A new Trials class SparkTrials is implemented to distribute Hyperopt trial runs among multiple machines and nodes using Apache Spark. 此外,所有优化试验以及优化后的超参数和目标指标会自动记录到 MLflow 运行In addition, all tuning experiments, along with the tuned hyperparameters and targeted metrics, are automatically logged to MLflow runs. 请参阅分布式 Hyperopt 和自动化 MLflow 跟踪See Distributed Hyperopt and automated MLflow tracking.


此功能目前以公共预览版提供。This feature is in Public Preview.

Apache Spark MLlib + 自动化 MLflow 跟踪Apache Spark MLlib + automated MLflow tracking

对于使用 PySpark 优化算法 CrossValidatorTrainValidationSplit 拟合的模型,Databricks Runtime 5.4 ML 支持自动记录 MLflow 运行Databricks Runtime 5.4 ML supports automatic logging of MLflow runs for models fit using PySpark tuning algorithms CrossValidator and TrainValidationSplit. 请参阅 Apache Spark MLlib 和自动化 MLflow 跟踪See Apache Spark MLlib and automated MLflow tracking. 此功能在 Databricks Runtime 5.4 ML 中默认处于启用状态,但在 Databricks Runtime 5.3 ML 中则默认处于关闭状态。This feature is on by default in Databricks Runtime 5.4 ML but was off by default in Databricks Runtime 5.3 ML.


此功能目前以公共预览版提供。This feature is in Public Preview.

HorovodRunner 改进HorovodRunner improvement

从 Horovod 发送到 Spark 驱动程序节点的输出现在显示在笔记本单元中。Output sent from Horovod to the Spark driver node is now visible in notebook cells.

XGBoost Python 包更新XGBoost Python package update

已安装 XGBoost Python 包 0.80。XGBoost Python package 0.80 is installed.

系统环境System environment

Databricks Runtime 5.4 ML 中的系统环境在以下方面不同于 Databricks Runtime 5.4:The system environment in Databricks Runtime 5.4 ML differs from Databricks Runtime 5.4 as follows:

  • Python:2.7.15 适用于 Python 2 群集,3.6.5 适用于 Python 3 群集。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • DBUtils:Databricks Runtime 5.4 ML 未包含库实用工具DBUtils: Databricks Runtime 5.4 ML does not contain Library utilities.
  • 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla 驱动程序 396.44Tesla driver 396.44
    • CUDA 9.2CUDA 9.2
    • CUDNN 7.2.1CUDNN 7.2.1


以下部分列出了 Databricks Runtime 5.4 ML 中包含的库,这些库不同于 Databricks Runtime 5.4 中包含的库。The following sections list the libraries included in Databricks Runtime 5.4 ML that differ from those included in Databricks Runtime 5.4.

顶层库Top-tier libraries

Databricks Runtime 5.4 ML 包含以下顶层Databricks Runtime 5.4 ML includes the following top-tier libraries:

Python 库Python libraries

Databricks Runtime 5.4 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.4 ML uses Conda for Python package management. 因此,已安装的 Python 库相对于 Databricks Runtime 有很大区别。As a result, there are major differences in installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。The following is a full list of provided Python packages and versions installed using Conda package manager.

Library 版本Version Library 版本Version Library 版本Version
absl-pyabsl-py argparseargparse asn1cryptoasn1crypto
astorastor backports-abcbackports-abc 0.50.5 backports.functools-lru-cachebackports.functools-lru-cache 1.51.5
backports.weakrefbackports.weakref 1.0.post11.0.post1 bcryptbcrypt bleachbleach
botoboto boto3boto3 1.7.621.7.62 botocorebotocore 1.10.621.10.62
certificertifi 2018.04.162018.04.16 cfficffi chardetchardet
cloudpicklecloudpickle coloramacolorama configparserconfigparser
密码系统cryptography cyclercycler CythonCython
decoratordecorator docutilsdocutils 0.140.14 entrypointsentrypoints
enum34enum34 et-xmlfileet-xmlfile funcsigsfuncsigs
functools32functools32 3.2.3-23.2.3-2 fusepyfusepy futurefuture
Futurefutures gastgast grpciogrpcio
h5pyh5py horovodhorovod html5libhtml5lib
hyperopthyperopt 0.1.2.db40.1.2.db4 idnaidna 2.62.6 ipaddressipaddress
ipythonipython ipython_genutilsipython_genutils jdcaljdcal 1.41.4
Jinja2Jinja2 2.102.10 jmespathjmespath jsonschemajsonschema
jupyter-clientjupyter-client jupyter-corejupyter-core KerasKeras
Keras-ApplicationsKeras-Applications Keras-PreprocessingKeras-Preprocessing kiwisolverkiwisolver
linecache2linecache2 llvmlitellvmlite lxmllxml
MarkdownMarkdown MarkupSafeMarkupSafe 1.01.0 matplotlibmatplotlib
mistunemistune mkl-fftmkl-fft mkl-randommkl-random
mleapmleap mockmock msgpackmsgpack
nbconvertnbconvert nbformatnbformat networkxnetworkx 2.22.2
nosenose nose-excludenose-exclude numbanumba 0.38.0+0.g2a2b772fc.dirty0.38.0+0.g2a2b772fc.dirty
numpynumpy olefileolefile openpyxlopenpyxl
pandaspandas pandocfilterspandocfilters paramikoparamiko
pathlib2pathlib2 patsypatsy pbrpbr
pexpectpexpect picklesharepickleshare PillowPillow
pippip plyply 3.113.11 prompt-toolkitprompt-toolkit
protobufprotobuf psutilpsutil psycopg2psycopg2
ptyprocessptyprocess pyarrowpyarrow pyasn1pyasn1
pycparserpycparser 2.182.18 PygmentsPygments pymongopymongo
PyNaClPyNaCl pyOpenSSLpyOpenSSL pyparsingpyparsing
PySocksPySocks PythonPython python-dateutilpython-dateutil
pytzpytz 2018.42018.4 PyYAMLPyYAML 5.15.1 pyzmqpyzmq
请求requests s3transfers3transfer scandirscandir 1.71.7
scikit-learnscikit-learn scipyscipy seabornseaborn
setuptoolssetuptools simplegenericsimplegeneric singledispatchsingledispatch
6six statsmodelsstatsmodels subprocess32subprocess32
tensorboardtensorboard tensorboardXtensorboardX 1.61.6 tensorflowtensorflow
termcolortermcolor testpathtestpath torchtorch
torchvisiontorchvision tornadotornado tqdmtqdm
traceback2traceback2 traitletstraitlets unittest2unittest2
urllib3urllib3 1.221.22 virtualenvvirtualenv wcwidthwcwidth
webencodingswebencodings WerkzeugWerkzeug wheelwheel
wraptwrapt wsgirefwsgiref

此外,以下 Spark 包还包括 Python 模块:In addition, the following Spark packages include Python modules:

Spark 包Spark Package Python 模块Python Module 版本Version
graphframesgraphframes graphframesgraphframes 0.7.0-db1-spark2.40.7.0-db1-spark2.4
spark-deep-learningspark-deep-learning sparkdlsparkdl 1.5.0-db3-spark2.41.5.0-db3-spark2.4
tensorframestensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11

R 库R libraries

R 库与 Databricks Runtime 5.4 中的 R 库完全相同。The R libraries are identical to the R Libraries in Databricks Runtime 5.4.

Java 库和 Scala 库(Scala 2.11 群集)Java and Scala libraries (Scala 2.11 cluster)

除了 Databricks Runtime 5.4 中的 Java 库和 Scala 库之外,Databricks Runtime 5.4 ML 还包含以下 JAR:In addition to Java and Scala libraries in Databricks Runtime 5.4, Databricks Runtime 5.4 ML contains the following JARs:

组 IDGroup ID 项目 IDArtifact ID 版本Version
com.databrickscom.databricks spark-deep-learningspark-deep-learning 1.5.0-db3-spark2.41.5.0-db3-spark2.4
com.typesafe.akkacom.typesafe.akka akka-actor_2.11akka-actor_2.11
ml.combust.mleapml.combust.mleap mleap-databricks-runtime_2.11mleap-databricks-runtime_2.11
ml.dmlcml.dmlc xgboost4jxgboost4j 0.810.81
ml.dmlcml.dmlc xgboost4j-sparkxgboost4j-spark 0.810.81
org.graphframesorg.graphframes graphframes_2.11graphframes_2.11 0.7.0-db1-spark2.40.7.0-db1-spark2.4
org.tensorfloworg.tensorflow libtensorflowlibtensorflow
org.tensorfloworg.tensorflow libtensorflow_jnilibtensorflow_jni
org.tensorfloworg.tensorflow spark-tensorflow-connector_2.11spark-tensorflow-connector_2.11
org.tensorfloworg.tensorflow tensorflowtensorflow
org.tensorframesorg.tensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11