Databricks Runtime 5.3 ML(不受支持)Databricks Runtime 5.3 ML (Unsupported)

Databricks 于 2019 年 4 月发布了此映像。Databricks released this image in April 2019.

Databricks Runtime 5.3 ML 基于 Databricks Runtime 5.3(不受支持),为机器学习和数据科学提供了随时可用的环境。Databricks Runtime 5.3 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.3 (Unsupported). 用于 ML 的 Databricks Runtime 包含许多常用的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtime for ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 进行分布式深度学习训练。It also supports distributed deep learning training using Horovod.

有关详细信息,包括有关创建 Databricks Runtime ML 群集的说明,请参阅用于机器学习的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

新增功能New features

Databricks Runtime 5.3 ML 是基于 Databricks Runtime 5.3 构建的。Databricks Runtime 5.3 ML is built on top of Databricks Runtime 5.3. 若要了解 Databricks Runtime 5.3 中的新增功能,请参阅 Databricks Runtime 5.3(不受支持)发行说明。For information on what’s new in Databricks Runtime 5.3, see the Databricks Runtime 5.3 (Unsupported) release notes. Databricks Runtime 5.3 ML 不仅引入了库更新,还引入了以下新功能:In addition to library updates, Databricks Runtime 5.3 ML introduces the following new features:

  • MLflow + Apache Spark MLlib 集成:对于使用 PySpark 优化算法 CrossValidatorTrainValidationSplit 拟合的模型,Databricks Runtime 5.3 ML 支持自动记录 MLflow 运行MLflow + Apache Spark MLlib integration: Databricks Runtime 5.3 ML supports automatic logging of MLflow runs for models fit using PySpark tuning algorithms CrossValidator and TrainValidationSplit.

    重要

    此功能以个人预览版提供。This feature is in Private Preview. 请与你的 Azure Databricks 销售代表联系,了解有关启用它的信息。Contact your Azure Databricks sales representative to learn about enabling it.

  • 将以下库升级到最新版本:Upgrades the following libraries to the latest version:

    • 将 PyArrow 从 0.8.0 更新到 0.12.1:基于 Arrow 的转换支持 BinaryType,可用于 PandasUDFPyArrow from 0.8.0 to 0.12.1: BinaryType is supported by Arrow-based conversion and can be used in PandasUDF.
    • 将 Horovod 从 0.15.2 更新到 0.16.0。Horovod from 0.15.2 to 0.16.0.
    • 将 TensorboardX 从 1.4 更新到 1.6。TensorboardX from 1.4 to 1.6.

不建议使用 Databricks ML 模型导出 API。The Databricks ML Model Export API has been deprecated. Azure Databricks 建议改用 MLeap,后者可提供更广泛地 MLlib 模型类型。Azure Databricks recommends using MLeap instead, which provides broader coverage of MLlib model types. 有关详细信息,请参阅 MLeap ML 模型导出Find out more at MLeap ML Model Export.

备注

此外,Databricks Runtime 5.3 包含已优化的新 FUSE 装载,用于数据加载、模型检查点检查以及从每个辅助角色到共享存储位置 file:/dbfs/ml 的日志记录,从而为深度学习工作负载提供高性能 I/O。In addition, Databricks Runtime 5.3 contains a new FUSE mount optimized for data loading, model checkpointing, and logging from each worker to a shared storage location file:/dbfs/ml, which provides high-performance I/O for deep learning workloads. 请参阅加载数据See Load data.

维护更新Maintenance updates

请参阅 Databricks Runtime 5.4 ML 维护更新See Databricks Runtime 5.4 ML maintenance updates.

系统环境System environment

Databricks Runtime 5.3 ML 中的系统环境与 Databricks Runtime 5.3 不同,如下所示:The system environment in Databricks Runtime 5.3 ML differs from Databricks Runtime 5.3 as follows:

  • Python:2.7.15 适用于 Python 2 群集,3.6.5 适用于 Python 3 群集。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • DBUtils:Databricks Runtime 5.3 ML 未包含库实用工具DBUtils: Databricks Runtime 5.3 ML does not contain Library utilities.
  • 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla 驱动程序 396.44Tesla driver 396.44
    • CUDA 9.2CUDA 9.2
    • CUDNN 7.2.1CUDNN 7.2.1

Libraries

以下部分列出了 Databricks Runtime 5.3 ML 中包含的库,这些库不同于 Databricks Runtime 5.3 中包含的库。The following sections list the libraries included in Databricks Runtime 5.3 ML that differ from those included in Databricks Runtime 5.3.

顶层库Top-tier libraries

Databricks Runtime 5.3 ML 包含以下顶层Databricks Runtime 5.3 ML includes the following top-tier libraries:

Python 库Python libraries

Databricks Runtime 5.3 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.3 ML uses Conda for Python package management. 因此,预安装的 Python 库相对于 Databricks Runtime 有很大区别。As a result, there are major differences in pre-installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。The following is a full list of provided Python packages and versions installed using Conda package manager.

Library 版本Version Library 版本Version Library 版本Version
absl-pyabsl-py 0.7.00.7.0 argparseargparse 1.4.01.4.0 asn1cryptoasn1crypto 0.24.00.24.0
astorastor 0.7.10.7.1 backports-abcbackports-abc 0.50.5 backports.functools-lru-cachebackports.functools-lru-cache 1.51.5
backports.weakrefbackports.weakref 1.0.post11.0.post1 bcryptbcrypt 3.1.63.1.6 bleachbleach 2.1.32.1.3
botoboto 2.48.02.48.0 boto3boto3 1.7.621.7.62 botocorebotocore 1.10.621.10.62
certificertifi 2018.04.162018.04.16 cfficffi 1.11.51.11.5 chardetchardet 3.0.43.0.4
cloudpicklecloudpickle 0.5.30.5.3 coloramacolorama 0.3.90.3.9 configparserconfigparser 3.5.03.5.0
密码系统cryptography 2.2.22.2.2 cyclercycler 0.10.00.10.0 CythonCython 0.28.20.28.2
decoratordecorator 4.3.04.3.0 docutilsdocutils 0.140.14 entrypointsentrypoints 0.2.30.2.3
enum34enum34 1.1.61.1.6 et-xmlfileet-xmlfile 1.0.11.0.1 funcsigsfuncsigs 1.0.21.0.2
functools32functools32 3.2.3-23.2.3-2 fusepyfusepy 2.0.42.0.4 Futurefutures 3.2.03.2.0
gastgast 0.2.20.2.2 grpciogrpcio 1.12.11.12.1 h5pyh5py 2.8.02.8.0
horovodhorovod 0.16.00.16.0 html5libhtml5lib 1.0.11.0.1 idnaidna 2.62.6
ipaddressipaddress 1.0.221.0.22 ipythonipython 5.7.05.7.0 ipython_genutilsipython_genutils 0.2.00.2.0
jdcaljdcal 1.41.4 Jinja2Jinja2 2.102.10 jmespathjmespath 0.9.30.9.3
jsonschemajsonschema 2.6.02.6.0 jupyter-clientjupyter-client 5.2.35.2.3 jupyter-corejupyter-core 4.4.04.4.0
KerasKeras 2.2.42.2.4 Keras-ApplicationsKeras-Applications 1.0.61.0.6 Keras-PreprocessingKeras-Preprocessing 1.0.51.0.5
kiwisolverkiwisolver 1.0.11.0.1 linecache2linecache2 1.0.01.0.0 llvmlitellvmlite 0.23.10.23.1
lxmllxml 4.2.14.2.1 MarkdownMarkdown 3.0.13.0.1 MarkupSafeMarkupSafe 1.01.0
matplotlibmatplotlib 2.2.22.2.2 mistunemistune 0.8.30.8.3 mleapmleap 0.8.10.8.1
mockmock 2.0.02.0.0 msgpackmsgpack 0.5.60.5.6 nbconvertnbconvert 5.3.15.3.1
nbformatnbformat 4.4.04.4.0 nosenose 1.3.71.3.7 nose-excludenose-exclude 0.5.00.5.0
numbanumba 0.38.0+0.g2a2b772fc.dirty0.38.0+0.g2a2b772fc.dirty numpynumpy 1.14.31.14.3 olefileolefile 0.45.10.45.1
openpyxlopenpyxl 2.5.32.5.3 pandaspandas 0.23.00.23.0 pandocfilterspandocfilters 1.4.21.4.2
paramikoparamiko 2.4.12.4.1 pathlib2pathlib2 2.3.22.3.2 patsypatsy 0.5.00.5.0
pbrpbr 5.1.15.1.1 pexpectpexpect 4.5.04.5.0 picklesharepickleshare 0.7.40.7.4
PillowPillow 5.1.05.1.0 pippip 10.0.110.0.1 plyply 3.113.11
prompt-toolkitprompt-toolkit 1.0.151.0.15 protobufprotobuf 3.6.13.6.1 psutilpsutil 5.6.05.6.0
psycopg2psycopg2 2.7.52.7.5 ptyprocessptyprocess 0.5.20.5.2 pyarrowpyarrow 0.12.10.12.1
pyasn1pyasn1 0.4.50.4.5 pycparserpycparser 2.182.18 PygmentsPygments 2.2.02.2.0
PyNaClPyNaCl 1.3.01.3.0 pyOpenSSLpyOpenSSL 18.0.018.0.0 pyparsingpyparsing 2.2.02.2.0
PySocksPySocks 1.6.81.6.8 PythonPython 2.7.152.7.15 python-dateutilpython-dateutil 2.7.32.7.3
pytzpytz 2018.42018.4 PyYAMLPyYAML 3.123.12 pyzmqpyzmq 17.0.017.0.0
请求requests 2.18.42.18.4 s3transfers3transfer 0.1.130.1.13 scandirscandir 1.71.7
scikit-learnscikit-learn 0.19.10.19.1 scipyscipy 1.1.01.1.0 seabornseaborn 0.8.10.8.1
setuptoolssetuptools 39.1.039.1.0 simplegenericsimplegeneric 0.8.10.8.1 singledispatchsingledispatch 3.4.0.33.4.0.3
6six 1.11.01.11.0 statsmodelsstatsmodels 0.9.00.9.0 subprocess32subprocess32 3.5.33.5.3
tensorboardtensorboard 1.12.21.12.2 tensorboardXtensorboardX 1.61.6 tensorflowtensorflow 1.12.01.12.0
termcolortermcolor 1.1.01.1.0 testpathtestpath 0.3.10.3.1 torchtorch 0.4.10.4.1
torchvisiontorchvision 0.2.10.2.1 tornadotornado 5.0.25.0.2 traceback2traceback2 1.4.01.4.0
traitletstraitlets 4.3.24.3.2 unittest2unittest2 1.1.01.1.0 urllib3urllib3 1.221.22
virtualenvvirtualenv 16.0.016.0.0 wcwidthwcwidth 0.1.70.1.7 webencodingswebencodings 0.5.10.5.1
WerkzeugWerkzeug 0.14.10.14.1 wheelwheel 0.31.10.31.1 wraptwrapt 1.10.111.10.11
wsgirefwsgiref 0.1.20.1.2

此外,以下 Spark 包还包括 Python 模块:In addition, the following Spark packages include Python modules:

Spark 包Spark Package Python 模块Python Module 版本Version
graphframesgraphframes graphframesgraphframes 0.7.0-db1-spark2.40.7.0-db1-spark2.4
spark-deep-learningspark-deep-learning sparkdlsparkdl 1.5.0-db1-spark2.41.5.0-db1-spark2.4
tensorframestensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11

R 库R libraries

R 库与 Databricks Runtime 5.3 中的 R 库完全相同。The R libraries are identical to the R Libraries in Databricks Runtime 5.3.

Java 库和 Scala 库(Scala 2.11 群集)Java and Scala libraries (Scala 2.11 cluster)

除了 Databricks Runtime 5.3 中的 Java 库和 Scala 库之外,Databricks Runtime 5.3 ML 还包含以下 JAR:In addition to Java and Scala libraries in Databricks Runtime 5.3, Databricks Runtime 5.3 ML contains the following JARs:

组 IDGroup ID 项目 IDArtifact ID 版本Version
com.databrickscom.databricks spark-deep-learningspark-deep-learning 1.5.0-db1-spark2.41.5.0-db1-spark2.4
com.typesafe.akkacom.typesafe.akka akka-actor_2.11akka-actor_2.11 2.3.112.3.11
ml.combust.mleapml.combust.mleap mleap-databricks-runtime_2.11mleap-databricks-runtime_2.11 0.13.00.13.0
ml.dmlcml.dmlc xgboost4jxgboost4j 0.810.81
ml.dmlcml.dmlc xgboost4j-sparkxgboost4j-spark 0.810.81
org.graphframesorg.graphframes graphframes_2.11graphframes_2.11 0.7.0-db1-spark2.40.7.0-db1-spark2.4
org.tensorfloworg.tensorflow libtensorflowlibtensorflow 1.12.01.12.0
org.tensorfloworg.tensorflow libtensorflow_jnilibtensorflow_jni 1.12.01.12.0
org.tensorfloworg.tensorflow spark-tensorflow-connector_2.11spark-tensorflow-connector_2.11 1.12.01.12.0
org.tensorfloworg.tensorflow tensorflowtensorflow 1.12.01.12.0
org.tensorframesorg.tensorframes tensorframestensorframes 0.6.0-s_2.110.6.0-s_2.11