2019 年 6 月June 2019

这些功能和 Azure Databricks 平台改进已于 2019 年 6 月发布。These features and Azure Databricks platform improvements were released in June 2019.

备注

发布分阶段进行。Releases are staged. 在初始发布日期后,可能最长需要等待一周,你的 Azure Databricks 帐户才会更新。Your Azure Databricks account may not be updated until up to a week after the initial release date.

Lsv2 实例支持已推出正式版Lsv2 instance support is generally available

2019 年 6 月 24 日至 26 日:版本 2.100June 24 - 26, 2019: Version 2.100

Azure Databricks 现在为 Lsv2 VM 系列提供全面支持,以实现高吞吐量和高 IOPS 工作负载。Azure Databricks now provides full support for the Lsv2 VM series for high-throughput and high-IOPS workloads.

RStudio 集成不再局限于高并发性群集RStudio integration no longer limited to high concurrency clusters

2019 年 6 月 6 日至 11 日:版本 2.99June 6 - 11, 2019: Version 2.99

现在,你可以在 Azure Databricks 中的标准群集上启用 RStudio Server,以及已支持的高并发群集。Now you can enable RStudio Server on standard clusters in Azure Databricks, in addition to the high-concurrency clusters that were already supported. 无论采用何种群集模式,RStudio Server 集成都将继续要求你为群集禁用自动终止选项。Regardless of cluster mode, RStudio Server integration continues to require that you disable the automatic termination option for your cluster. 请参阅 Azure Databricks 上的 RStudioSee RStudio on Azure Databricks.

MLflow 1.0MLflow 1.0

2019 年 6 月 3 日June 3, 2019

MLflow 是用于管理完整的机器学习生命周期的开放源代码平台。MLflow is an open source platform to manage the complete machine learning lifecycle. 借助 MLflow,数据科学家可以在本地或在云中跟踪和共享试验、在各种框架上打包和共享模型,以及几乎可随处部署模型。With MLflow, data scientists can track and share experiments locally or in the cloud, package and share models across frameworks, and deploy models virtually anywhere.

今天,我们很高兴地宣布 MLflow 1.0 版本的发布。We are excited to announce the release of MLflow 1.0 today. 1.0 版本不仅能标记 API 的成熟度和稳定性,还增加了许多常见请求的功能和改进:The 1.0 release not only marks the maturity and stability of the APIs, but also adds a number of frequently requested features and improvements:

  • CLI 已重新组织,并且现在具有用于项目、模型、db(跟踪数据库)和服务器(跟踪服务器)的专用命令。The CLI was reorganized and now has dedicated commands for artifacts, models, db (the tracking database), and server (the tracking server).
  • 跟踪服务器搜索支持 SQL WHERE 子句的简化版本。Tracking server search supports a simplified version of the SQL WHERE clause. 除了支持运行指标和参数外,搜索功能还得到了增强,可支持某些运行属性以及用户和系统标记。In addition to supporting run metrics and params, search has been enhanced to support some run attributes and user and system tags.
  • 添加了对跟踪 API 中 x 坐标的支持。Adds support for x coordinates in the Tracking API. MLflow UI 可视化组件现在还支持针对提供的 x 坐标值绘制指标。The MLflow UI visualization components now also supports plotting metrics against provided x-coordinate values.
  • 添加了一个 runs/log-batch REST API 终结点,以及用于通过单个 API 请求记录多个指标、参数和标记的 Python、R 和 Java 方法。Adds a runs/log-batch REST API endpoint as well as Python, R, and Java methods for logging multiple metrics, parameters, and tags with a single API request.
  • 对于跟踪,Windows 上现在支持 MLflow 1.0 客户端。For tracking, the MLflow 1.0 client is now supported on Windows.
  • 添加了对 HDFS 作为项目存储后端的支持。Adds support for HDFS as an artifact store backend.
  • 添加命令以生成 Docker 容器,该容器的默认入口点为容器内端口 8080 处的指定 MLflow Python 函数模型提供服务。Adds a command to build a Docker container whose default entry point serves the specified MLflow Python function model at port 8080 within the container.
  • 添加实验性 ONNX 模型风格。Adds an experimental ONNX model flavor.

可以在 MLflow 更改日志中查看更改的完整列表。You can view the full list of changes in the MLflow Change log.

带有 Conda 的 Databricks Runtime 5.4 (Beta)Databricks Runtime 5.4 with Conda (Beta)

2019 年 6 月 3 日June 3, 2019

重要

带有 Conda 的 Databricks Runtime 以 beta 版本提供。Databricks Runtime with Conda is in Beta. 在即将发布的 Beta 版本中,支持的环境的内容可能会发生变化。The contents of the supported environments may change in upcoming Beta releases. 更改可能包括包列表或已安装包的版本的列表。Changes can include the list of packages or versions of installed packages. 带有 Conda 的 Databricks Runtime 5.4 是基于 Databricks Runtime 5.4(不受支持)构建的。Databricks Runtime 5.4 with Conda is built on top of Databricks Runtime 5.4 (Unsupported).

很高兴引入带有 Conda 的 Databricks Runtime 5.4,它可以让你利用 Conda 来管理 Python 库和环境。We’re excited to introduce Databricks Runtime 5.4 with Conda, which lets you take advantage of Conda to manage Python libraries and environments. 此运行时在创建群集时提供两个 Conda 根环境选项:This runtime offers two root Conda environment options at cluster creation:

  • Databricks 标准 环境包括许多常用 Python 包的更新版本。Databricks Standard environment includes updated versions of many popular Python packages. 此环境旨在替代在 Databricks Runtime 上运行的现有笔记本。This environment is intended as a drop-in replacement for existing notebooks that run on Databricks Runtime. 这是基于 Databricks Conda 的默认运行时环境。This is the default Databricks Conda-based runtime environment.
  • Databricks Minimal 环境包含 PySpark 和 Databricks Python 笔记本功能所需的最小数量的包。Databricks Minimal environment contains the minimum packages required for PySpark and Databricks Python notebook functionality. 这个环境非常适合使用各种 Python 包进行运行时自定义。This environment is ideal if you want to customize the runtime with various Python packages.

请参阅带有 Conda 的 Databricks Runtime 5.4(beta 版本) 的完整发行说明。See the complete release notes at Databricks Runtime 5.4 with Conda (Beta).

用于机器学习的 Databricks Runtime 5.4Databricks Runtime 5.4 for Machine Learning

2019 年 6 月 3 日June 3, 2019

Databricks Runtime 5.4 ML 是基于 Databricks Runtime 5.4(不受支持)构建的。Databricks Runtime 5.4 ML is built on top of Databricks Runtime 5.4 (Unsupported). 它包含许多常见的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost,并使用 Horovod 提供分布式 TensorFlow 训练。It contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost, and provides distributed TensorFlow training using Horovod.

它包括以下新增功能:It includes the following new features:

  • MLlib 与 MLflow 集成(公共预览版)。MLlib integration with MLflow (Public Preview).
  • 预安装了新 SparkTrials 类的 Hyperopt(公共预览版)。Hyperopt with new SparkTrials class pre-installed (Public Preview).
  • 从 Horovod 发送到 Spark 驱动程序节点的 HorovodRunner 输出现在显示在笔记本单元中。HorovodRunner output sent from Horovod to the Spark driver node is now visible in notebook cells.
  • 已预安装的 XGBoost Python 包。XGBoost Python package pre-installed.

有关详细信息,请参阅用于机器学习的 Databricks Runtime 5.4(不受支持)For details, see Databricks Runtime 5.4 for Machine Learning (Unsupported).

Databricks Runtime 5.4Databricks Runtime 5.4

2019 年 6 月 3 日June 3, 2019

Databricks Runtime 5.4 现已推出。Databricks Runtime 5.4 is now available. Databricks Runtime 5.4 包括 Apache Spark 2.4.2、已升级的 Python、R、Java 和 Scala 库,以及以下新增功能:Databricks Runtime 5.4 includes Apache Spark 2.4.2, upgraded Python, R, Java, and Scala libraries, and the following new features:

  • Databricks 上的 Delta Lake 增加了自动优化(公共预览版)Delta Lake on Databricks adds Auto Optimize (Public Preview)
  • 将最喜欢的 IDE 和笔记本服务器与 Databricks Connect 一起使用Use your favorite IDE and notebook server with Databricks Connect
  • 库实用工具已公开发布Library utilities generally available
  • 二进制文件数据源Binary file data source

有关详细信息,请参阅 Databricks Runtime 5.4(不受支持)For details, see Databricks Runtime 5.4 (Unsupported).