Databricks Runtime 5.4 ML(不受支持)Databricks Runtime 5.4 ML (Unsupported)
Databricks 于 2019 年 6 月发布了此映像。Databricks released this image in June 2019.
Databricks Runtime 5.4 ML 基于 Databricks Runtime 5.4(不受支持),为机器学习和数据科学提供了随时可用的环境。Databricks Runtime 5.4 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.4 (Unsupported). 用于 ML 的 Databricks Runtime 包含许多常用的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtime for ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 进行分布式深度学习训练。It also supports distributed deep learning training using Horovod.
有关详细信息,包括有关创建 Databricks Runtime ML 群集的说明,请参阅用于机器学习的 Databricks Runtime。For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.
新增功能New features
Databricks Runtime 5.4 ML 是基于 Databricks Runtime 5.4 构建的。Databricks Runtime 5.4 ML is built on top of Databricks Runtime 5.4. 若要了解 Databricks Runtime 5.4 中的新增功能,请参阅 Databricks Runtime 5.4(不受支持)发行说明。For information on what’s new in Databricks Runtime 5.4, see the Databricks Runtime 5.4 (Unsupported) release notes.
除了库更新,Databricks Runtime 5.4 ML 还引入了以下新功能:In addition to library updates, Databricks Runtime 5.4 ML introduces the following new features:
分布式 Hyperopt + 自动化 MLflow 跟踪Distributed Hyperopt + automated MLflow tracking
Databricks Runtime 5.4 ML 引入了由 Apache Spark 提供支持的 Hyperopt 的新实现,用于缩放和简化超参数优化。Databricks Runtime 5.4 ML introduces a new implementation of Hyperopt powered by Apache Spark to scale and simplify hyperparameter tuning. 实现新的 Trials
类 SparkTrials
是为了使用 Apache Spark 在多个计算机和节点之间分发 Hyperopt 试用版运行。A new Trials
class SparkTrials
is implemented to distribute Hyperopt trial runs among multiple machines and nodes using Apache Spark. 此外,所有优化试验以及优化后的超参数和目标指标会自动记录到 MLflow 运行。In addition, all tuning experiments, along with the tuned hyperparameters and targeted metrics, are automatically logged to MLflow runs. 请参阅分布式 Hyperopt 和自动化 MLflow 跟踪。See Distributed Hyperopt and automated MLflow tracking.
重要
此功能目前以公共预览版提供。This feature is in Public Preview.
Apache Spark MLlib + 自动化 MLflow 跟踪Apache Spark MLlib + automated MLflow tracking
对于使用 PySpark 优化算法 CrossValidator
和 TrainValidationSplit
拟合的模型,Databricks Runtime 5.4 ML 支持自动记录 MLflow 运行。Databricks Runtime 5.4 ML supports automatic logging of MLflow runs for models fit using PySpark tuning algorithms CrossValidator
and TrainValidationSplit
. 请参阅 Apache Spark MLlib 和自动化 MLflow 跟踪。See Apache Spark MLlib and automated MLflow tracking. 此功能在 Databricks Runtime 5.4 ML 中默认处于启用状态,但在 Databricks Runtime 5.3 ML 中则默认处于关闭状态。This feature is on by default in Databricks Runtime 5.4 ML but was off by default in Databricks Runtime 5.3 ML.
重要
此功能目前以公共预览版提供。This feature is in Public Preview.
HorovodRunner 改进HorovodRunner improvement
从 Horovod 发送到 Spark 驱动程序节点的输出现在显示在笔记本单元中。Output sent from Horovod to the Spark driver node is now visible in notebook cells.
XGBoost Python 包更新XGBoost Python package update
已安装 XGBoost Python 包 0.80。XGBoost Python package 0.80 is installed.
系统环境System environment
Databricks Runtime 5.4 ML 中的系统环境在以下方面不同于 Databricks Runtime 5.4:The system environment in Databricks Runtime 5.4 ML differs from Databricks Runtime 5.4 as follows:
- Python:2.7.15 适用于 Python 2 群集,3.6.5 适用于 Python 3 群集。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
- DBUtils:Databricks Runtime 5.4 ML 未包含库实用工具。DBUtils: Databricks Runtime 5.4 ML does not contain Library utilities.
- 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
- Tesla 驱动程序 396.44Tesla driver 396.44
- CUDA 9.2CUDA 9.2
- CUDNN 7.2.1CUDNN 7.2.1
库 Libraries
以下部分列出了 Databricks Runtime 5.4 ML 中包含的库,这些库不同于 Databricks Runtime 5.4 中包含的库。The following sections list the libraries included in Databricks Runtime 5.4 ML that differ from those included in Databricks Runtime 5.4.
顶层库Top-tier libraries
Databricks Runtime 5.4 ML 包含以下顶层库:Databricks Runtime 5.4 ML includes the following top-tier libraries:
- GraphFramesGraphFrames
- Horovod 和 HorovodRunnerHorovod and HorovodRunner
- PyTorchPyTorch
- spark-tensorflow-connectorspark-tensorflow-connector
- TensorFlowTensorFlow
- TensorBoardTensorBoard
Python 库Python libraries
Databricks Runtime 5.4 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.4 ML uses Conda for Python package management. 因此,已安装的 Python 库相对于 Databricks Runtime 有很大区别。As a result, there are major differences in installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。The following is a full list of provided Python packages and versions installed using Conda package manager.
库Library | 版本Version | 库Library | 版本Version | 库Library | 版本Version |
---|---|---|---|---|---|
absl-pyabsl-py | 0.7.10.7.1 | argparseargparse | 1.4.01.4.0 | asn1cryptoasn1crypto | 0.24.00.24.0 |
astorastor | 0.7.10.7.1 | backports-abcbackports-abc | 0.50.5 | backports.functools-lru-cachebackports.functools-lru-cache | 1.51.5 |
backports.weakrefbackports.weakref | 1.0.post11.0.post1 | bcryptbcrypt | 3.1.63.1.6 | bleachbleach | 2.1.32.1.3 |
botoboto | 2.48.02.48.0 | boto3boto3 | 1.7.621.7.62 | botocorebotocore | 1.10.621.10.62 |
certificertifi | 2018.04.162018.04.16 | cfficffi | 1.11.51.11.5 | chardetchardet | 3.0.43.0.4 |
cloudpicklecloudpickle | 0.5.30.5.3 | coloramacolorama | 0.3.90.3.9 | configparserconfigparser | 3.5.03.5.0 |
密码系统cryptography | 2.2.22.2.2 | cyclercycler | 0.10.00.10.0 | CythonCython | 0.28.20.28.2 |
decoratordecorator | 4.3.04.3.0 | docutilsdocutils | 0.140.14 | entrypointsentrypoints | 0.2.30.2.3 |
enum34enum34 | 1.1.61.1.6 | et-xmlfileet-xmlfile | 1.0.11.0.1 | funcsigsfuncsigs | 1.0.21.0.2 |
functools32functools32 | 3.2.3-23.2.3-2 | fusepyfusepy | 2.0.42.0.4 | futurefuture | 0.17.10.17.1 |
Futurefutures | 3.2.03.2.0 | gastgast | 0.2.20.2.2 | grpciogrpcio | 1.12.11.12.1 |
h5pyh5py | 2.8.02.8.0 | horovodhorovod | 0.16.00.16.0 | html5libhtml5lib | 1.0.11.0.1 |
hyperopthyperopt | 0.1.2.db40.1.2.db4 | idnaidna | 2.62.6 | ipaddressipaddress | 1.0.221.0.22 |
ipythonipython | 5.7.05.7.0 | ipython_genutilsipython_genutils | 0.2.00.2.0 | jdcaljdcal | 1.41.4 |
Jinja2Jinja2 | 2.102.10 | jmespathjmespath | 0.9.40.9.4 | jsonschemajsonschema | 2.6.02.6.0 |
jupyter-clientjupyter-client | 5.2.35.2.3 | jupyter-corejupyter-core | 4.4.04.4.0 | KerasKeras | 2.2.42.2.4 |
Keras-ApplicationsKeras-Applications | 1.0.71.0.7 | Keras-PreprocessingKeras-Preprocessing | 1.0.91.0.9 | kiwisolverkiwisolver | 1.1.01.1.0 |
linecache2linecache2 | 1.0.01.0.0 | llvmlitellvmlite | 0.23.10.23.1 | lxmllxml | 4.2.14.2.1 |
MarkdownMarkdown | 3.1.13.1.1 | MarkupSafeMarkupSafe | 1.01.0 | matplotlibmatplotlib | 2.2.22.2.2 |
mistunemistune | 0.8.30.8.3 | mkl-fftmkl-fft | 1.0.01.0.0 | mkl-randommkl-random | 1.0.11.0.1 |
mleapmleap | 0.8.10.8.1 | mockmock | 2.0.02.0.0 | msgpackmsgpack | 0.5.60.5.6 |
nbconvertnbconvert | 5.3.15.3.1 | nbformatnbformat | 4.4.04.4.0 | networkxnetworkx | 2.22.2 |
nosenose | 1.3.71.3.7 | nose-excludenose-exclude | 0.5.00.5.0 | numbanumba | 0.38.0+0.g2a2b772fc.dirty0.38.0+0.g2a2b772fc.dirty |
numpynumpy | 1.14.31.14.3 | olefileolefile | 0.45.10.45.1 | openpyxlopenpyxl | 2.5.32.5.3 |
pandaspandas | 0.23.00.23.0 | pandocfilterspandocfilters | 1.4.21.4.2 | paramikoparamiko | 2.4.12.4.1 |
pathlib2pathlib2 | 2.3.22.3.2 | patsypatsy | 0.5.00.5.0 | pbrpbr | 5.1.35.1.3 |
pexpectpexpect | 4.5.04.5.0 | picklesharepickleshare | 0.7.40.7.4 | PillowPillow | 5.1.05.1.0 |
pippip | 10.0.110.0.1 | plyply | 3.113.11 | prompt-toolkitprompt-toolkit | 1.0.151.0.15 |
protobufprotobuf | 3.7.13.7.1 | psutilpsutil | 5.6.25.6.2 | psycopg2psycopg2 | 2.7.52.7.5 |
ptyprocessptyprocess | 0.5.20.5.2 | pyarrowpyarrow | 0.12.10.12.1 | pyasn1pyasn1 | 0.4.50.4.5 |
pycparserpycparser | 2.182.18 | PygmentsPygments | 2.2.02.2.0 | pymongopymongo | 3.8.03.8.0 |
PyNaClPyNaCl | 1.3.01.3.0 | pyOpenSSLpyOpenSSL | 18.0.018.0.0 | pyparsingpyparsing | 2.2.02.2.0 |
PySocksPySocks | 1.6.81.6.8 | PythonPython | 2.7.152.7.15 | python-dateutilpython-dateutil | 2.7.32.7.3 |
pytzpytz | 2018.42018.4 | PyYAMLPyYAML | 5.15.1 | pyzmqpyzmq | 17.0.017.0.0 |
请求requests | 2.18.42.18.4 | s3transfers3transfer | 0.1.130.1.13 | scandirscandir | 1.71.7 |
scikit-learnscikit-learn | 0.19.10.19.1 | scipyscipy | 1.1.01.1.0 | seabornseaborn | 0.8.10.8.1 |
setuptoolssetuptools | 39.1.039.1.0 | simplegenericsimplegeneric | 0.8.10.8.1 | singledispatchsingledispatch | 3.4.0.33.4.0.3 |
6six | 1.11.01.11.0 | statsmodelsstatsmodels | 0.9.00.9.0 | subprocess32subprocess32 | 3.5.43.5.4 |
tensorboardtensorboard | 1.12.21.12.2 | tensorboardXtensorboardX | 1.61.6 | tensorflowtensorflow | 1.12.01.12.0 |
termcolortermcolor | 1.1.01.1.0 | testpathtestpath | 0.3.10.3.1 | torchtorch | 0.4.10.4.1 |
torchvisiontorchvision | 0.2.10.2.1 | tornadotornado | 5.0.25.0.2 | tqdmtqdm | 4.32.14.32.1 |
traceback2traceback2 | 1.4.01.4.0 | traitletstraitlets | 4.3.24.3.2 | unittest2unittest2 | 1.1.01.1.0 |
urllib3urllib3 | 1.221.22 | virtualenvvirtualenv | 16.0.016.0.0 | wcwidthwcwidth | 0.1.70.1.7 |
webencodingswebencodings | 0.5.10.5.1 | WerkzeugWerkzeug | 0.14.10.14.1 | wheelwheel | 0.31.10.31.1 |
wraptwrapt | 1.10.111.10.11 | wsgirefwsgiref | 0.1.20.1.2 |
此外,以下 Spark 包还包括 Python 模块:In addition, the following Spark packages include Python modules:
Spark 包Spark Package | Python 模块Python Module | 版本Version |
---|---|---|
graphframesgraphframes | graphframesgraphframes | 0.7.0-db1-spark2.40.7.0-db1-spark2.4 |
spark-deep-learningspark-deep-learning | sparkdlsparkdl | 1.5.0-db3-spark2.41.5.0-db3-spark2.4 |
tensorframestensorframes | tensorframestensorframes | 0.6.0-s_2.110.6.0-s_2.11 |
R 库R libraries
R 库与 Databricks Runtime 5.4 中的 R 库完全相同。The R libraries are identical to the R Libraries in Databricks Runtime 5.4.
Java 库和 Scala 库(Scala 2.11 群集)Java and Scala libraries (Scala 2.11 cluster)
除了 Databricks Runtime 5.4 中的 Java 库和 Scala 库之外,Databricks Runtime 5.4 ML 还包含以下 JAR:In addition to Java and Scala libraries in Databricks Runtime 5.4, Databricks Runtime 5.4 ML contains the following JARs:
组 IDGroup ID | 项目 IDArtifact ID | 版本Version |
---|---|---|
com.databrickscom.databricks | spark-deep-learningspark-deep-learning | 1.5.0-db3-spark2.41.5.0-db3-spark2.4 |
com.typesafe.akkacom.typesafe.akka | akka-actor_2.11akka-actor_2.11 | 2.3.112.3.11 |
ml.combust.mleapml.combust.mleap | mleap-databricks-runtime_2.11mleap-databricks-runtime_2.11 | 0.13.00.13.0 |
ml.dmlcml.dmlc | xgboost4jxgboost4j | 0.810.81 |
ml.dmlcml.dmlc | xgboost4j-sparkxgboost4j-spark | 0.810.81 |
org.graphframesorg.graphframes | graphframes_2.11graphframes_2.11 | 0.7.0-db1-spark2.40.7.0-db1-spark2.4 |
org.tensorfloworg.tensorflow | libtensorflowlibtensorflow | 1.12.01.12.0 |
org.tensorfloworg.tensorflow | libtensorflow_jnilibtensorflow_jni | 1.12.01.12.0 |
org.tensorfloworg.tensorflow | spark-tensorflow-connector_2.11spark-tensorflow-connector_2.11 | 1.12.01.12.0 |
org.tensorfloworg.tensorflow | tensorflowtensorflow | 1.12.01.12.0 |
org.tensorframesorg.tensorframes | tensorframestensorframes | 0.6.0-s_2.110.6.0-s_2.11 |