2020 年 5 月May 2020

这些功能和 Azure Databricks 平台改进已于 2020 年 5 月发布。These features and Azure Databricks platform improvements were released in May 2020.

备注

发布分阶段进行。Releases are staged. 在初始发布日期后,可能最长需要等待一周,你的 Azure Databricks 帐户才会更新。Your Azure Databricks account may not be updated until up to a week after the initial release date.

Easv4 系列 VM(Beta 版本)Easv4-series VMs (Beta)

2020 年 5 月 29 日May 29, 2020

Azure Databricks 现在为 Easv4 系列 虚拟机 (VM) 提供 Beta 版本支持,这些 VM 使用高级 SSD 并可将最大频率提高为 3.35 GHz。Azure Databricks now provides Beta support for Easv4-series VMs, which use a premium SSD and can achieve a boosted maximum frequency of 3.35GHz. 这些实例类型可优化内存密集型企业应用程序的工作负载性能。These instance types can optimize your workload performance for memory-intensive enterprise applications.

用于基因组学的 Databricks Runtime 6.6 正式版Databricks Runtime 6.6 for Genomics GA

2020 年 5 月 26 日May 26, 2020

用于基因组学的 Databricks Runtime 6.6 是在 Databricks Runtime 6.6 基础上构建的,包含以下新功能:Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:

  • GFF3 读取器GFF3 reader
  • 自定义引用基因组支持Custom reference genome support
  • 每样本管道超时Per-sample pipeline timeouts
  • BAM 导出选项BAM export option
  • 清单 BlobManifest blobs

有关详细信息,请参阅完整的用于基因组学的 Databricks Runtime 6.6(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.6 for Genomics (Unsupported) release notes.

Databricks Runtime 6.6 ML 正式版Databricks Runtime 6.6 ML GA

2020 年 5 月 26 日May 26, 2020

Databricks Runtime 6.6 ML 基于 Databricks Runtime 6.6 构建,包含以下新功能:Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:

  • mlflow 已升级:1.7.0 到 1.8.0Upgraded mlflow: 1.7.0 to 1.8.0

有关详细信息,请参阅完整的 Databricks Runtime 6.6 ML(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.6 ML (Unsupported) release notes.

Databricks Runtime 6.6 正式版Databricks Runtime 6.6 GA

2020 年 5 月 26 日May 26, 2020

Databricks Runtime 6.6 引入了许多库升级和新功能,其中包括以下 Delta Lake 功能:Databricks Runtime 6.6 brings many library upgrades and new features, including the following Delta Lake features:

  • 现可通过 merge 操作自动提升表的架构。You can now evolve the schema of the table automatically with the merge operation. 如果你想要将更改数据更新插入到一个表中,而且数据架构会随时间推移而变化,那么此功能非常有用。This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. merge 可同时改进架构和更新插入更改,而不是在更新插入之前检测和应用架构更改。Instead of detecting and applying schema changes before upserting, merge can simultaneously evolve the schema and upsert the changes. 请参阅自动架构演变See Automatic schema evolution.
  • 仅包含匹配子句的合并操作(即仅包含 updatedelete 操作,未包含 insert 操作)的性能已得到改进。The performance of merge operations that have only matched clauses, that is, they have only update and delete actions and no insert action, has been improved.
  • 现在,Hive 元存储中引用的 Parquet 表可以使用 CONVERT TO DELTA 通过其表识别符转换为 Delta Lake。Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA.

有关详细信息,请参阅完整的 Databricks Runtime 6.6(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.6 (Unsupported) release notes.

DBFS REST API 删除终结点大小限制DBFS REST API delete endpoint size limit

2020 年 5 月 21 日至 28 日:版本 3.20May 21-28, 2020: Version 3.20

使用 DBFS API 以递归方式删除大量文件时,删除操作以增量方式执行。When you delete a large number of files recursively using the DBFS API, the delete operation is done in increments. 此调用在大约 45 秒后返回响应,并出现一条错误消息,要求你重新调用删除操作,直至完全删除目录结构。The call returns a response after approximately 45s with an error message asking you to re-invoke the delete operation until the directory structure is fully deleted. 例如:For example:

{
  "error_code":"PARTIAL_DELETE","message":"The requested operation has deleted 324 files. There are more files remaining. You must make another request to delete more."
}

轻松查看大量已注册的 MLflow 模型Easily view large numbers of MLflow registered models

2020 年 5 月 21 日至 28 日:版本 3.20May 21-28, 2020: Version 3.20

MLflow 模型注册表现支持对已注册模型进行服务器端搜索和分页,这使具有大量模型的组织可高效地执行列出和搜索操作。The MLflow Model Registry now supports server-side search and pagination for registered models, which enables organizations with large numbers of models to efficiently perform listing and search. 与之前一样,可按名称搜索模型并获取按名称或上次更新时间排序的结果。As before, you can search models by name and get results ordered by name or the last updated time. 但是,如果你有大量模型,这些页面的加载速度要快得多,搜索将提取最新的模型视图。However, if you have a large number of models, the pages will load much faster, and search will fetch the most up-to-date view of models.

配置为要安装在所有群集上的库未安装在运行 Databricks Runtime 7.0 及更高版本的群集上Libraries configured to be installed on all clusters are not installed on clusters running Databricks Runtime 7.0 and above

2020 年 5 月 21 日至 28 日:版本 3.20May 21-28, 2020: Version 3.20

在 Databricks Runtime 7.0 及更高版本中,Apache Spark 的基础版本使用 Scala 2.12。In Databricks Runtime 7.0 and above, the underlying version of Apache Spark uses Scala 2.12. 针对 Scala 2.11 编译的库可能会以意外的方式禁用 Databricks Runtime 7.0 群集,因此运行 Databricks Runtime 7.0 及更高版本的群集不会安装配置为在所有群集上安装的库。Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. 群集的“库”选项卡会显示状态 Skipped,以及与库处理中的更改相关的弃用消息。The cluster Libraries tab shows a status Skipped and a deprecation message related to the changes in library handling.

如果你的群集是在早期版本的 Databricks Runtime(在 3.20 版发布到工作区之前的版本)上创建的,而且你现在将该群集编辑为使用 Databricks Runtime 7.0,那么配置为在所有群集上安装的任何库都将安装在该群集上。If you have a cluster that was created on an earlier version of Databricks Runtime before 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. 在这种情况下,已安装的库中的所有不兼容的 JAR 都可能导致群集被禁用。In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. 解决方法是克隆群集或创建新群集。The workaround is either to clone the cluster or to create a new cluster.

用于基因组学的 Databricks Runtime 7.0(Beta 版本)Databricks Runtime 7.0 for Genomics (Beta)

2020 年 5 月 21 日May 21, 2020

用于基因组学的 Databricks Runtime 7.0 是在 Databricks Runtime 7.0 基础上构建的,包含以下库更改:Databricks Runtime 7.0 for Genomics is built on top of Databricks Runtime 7.0 and includes the following library changes:

  • ADAM 库已从版本 0.30.0 更新为 0.32.0.The ADAM library has been updated from version 0.30.0 to 0.32.0.
  • Hail 库未包含在用于基因组学 Databricks Runtime 7.0 中,原因是没有基于 Apache Spark 3.0 的版本。The Hail library is not included in Databricks Runtime 7.0 for Genomics as there is no release based on Apache Spark 3.0.

有关详细信息,请参阅完整的用于基因组学的 Databricks Runtime 7.0(不受支持)发行说明。For more information, see the complete Databricks Runtime 7.0 for Genomics (Unsupported) release notes.

Databricks Runtime 7.0 ML(Beta 版本)Databricks Runtime 7.0 ML (Beta)

2020 年 5 月 21 日May 21, 2020

Databricks Runtime 7.0 ML 基于 Databricks Runtime 7.0 构建,包含以下新功能:Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0 and includes the following new features:

  • 笔记本范围内的 Python 库和自定义环境,通过 conda 和 pip 命令进行管理。Notebook-scoped Python libraries and custom environments managed by conda and pip commands.
  • 主要 Python 包的更新,包括 tensorflow、tensorboard、pytorch、xgboost、sparkdl 和 hyperopt。Updates for major Python packages including tensorflow, tensorboard, pytorch, xgboost, sparkdl, and hyperopt.
  • 新添加的 Python 包 lightgbm、nltk、petastorm 和 plotly。Newly added Python packages lightgbm, nltk, petastorm, and plotly.
  • RStudio Server 开源版 v1.2。RStudio Server Open Source v1.2.

有关详细信息,请参阅完整的 Databricks Runtime 7.0 ML(不受支持)发行说明。For more information, see the complete Databricks Runtime 7.0 ML (Unsupported) release notes.

用于基因组学的 Databricks Runtime 6.6(Beta 版本)Databricks Runtime 6.6 for Genomics (Beta)

2020 年 5 月 7 日May 7, 2020

用于基因组学的 Databricks Runtime 6.6 是在 Databricks Runtime 6.6 基础上构建的,包含以下新功能:Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:

  • GFF3 读取器GFF3 reader
  • 自定义引用基因组支持Custom reference genome support
  • 每样本管道超时Per-sample pipeline timeouts
  • BAM 导出选项BAM export option
  • 清单 BlobManifest blobs

有关详细信息,请参阅完整的用于基因组学的 Databricks Runtime 6.6(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.6 for Genomics (Unsupported) release notes.

Databricks Runtime 6.6 ML(Beta 版本)Databricks Runtime 6.6 ML (Beta)

2020 年 5 月 7 日May 7, 2020

Databricks Runtime 6.6 ML 基于 Databricks Runtime 6.6 构建,包含以下新功能:Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:

  • mlflow 已升级:1.7.0 到 1.8.0Upgraded mlflow: 1.7.0 to 1.8.0

有关详细信息,请参阅完整的 Databricks Runtime 6.6 ML(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.6 ML (Unsupported) release notes.

Databricks Runtime 6.6(Beta 版本)Databricks Runtime 6.6 (Beta)

2020 年 5 月 7 日May 7, 2020

Databricks Runtime 6.6(Beta 版本)引入了许多库升级和新功能,其中包括以下 Delta Lake 功能:Databricks Runtime 6.6 (Beta) brings many library upgrades and new features, including the following Delta Lake features:

  • 现可通过 merge 操作自动提升表的架构。You can now evolve the schema of the table automatically with the merge operation. 如果你想要将更改数据更新插入到一个表中,而且数据架构会随时间推移而变化,那么此功能非常有用。This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. merge 可同时改进架构和更新插入更改,而不是在更新插入之前检测和应用架构更改。Instead of detecting and applying schema changes before upserting, merge can simultaneously evolve the schema and upsert the changes. 请参阅自动架构演变See Automatic schema evolution.
  • 仅包含匹配子句的合并操作(即仅包含 updatedelete 操作,未包含 insert 操作)的性能已得到改进。The performance of merge operations that have only matched clauses, that is, they have only update and delete actions and no insert action, has been improved.
  • 现在,Hive 元存储中引用的 Parquet 表可以使用 CONVERT TO DELTA 通过其表识别符转换为 Delta Lake。Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA.

有关详细信息,请参阅完整的 Databricks Runtime 6.6(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.6 (Unsupported) release notes.

作业群集现已用作业名称和 ID 进行标记Job clusters now tagged with job name and ID

2020 年 5 月 5 日至 12 日:版本 3.19May 5-12, 2020: Version 3.19

作业群集会用作业名称和 ID 自动进行标记。Job clusters are automatically tagged with the job name and ID. 这些标记显示在可计费使用情况报告中,以便你可轻松地按作业确定 DBU 使用情况的原因并识别异常。The tags appear in the billable usage reports so that you can easily attribute your DBU usage by job and identify anomalies. 标记将按照群集标记规范进行清理,例如按允许的字符、最大大小和最大标记数。The tags are sanitized to cluster tag specifications, such as allowed characters, maximum size, and maximum number of tags. 作业名称包含在 RunName 标记中,作业 ID 包含在 JobId 标记中。The job name is contained in the RunName tag and the job ID is contained in the JobId tag.

还原已删除的笔记本单元Restore deleted notebook cells

2020 年 5 月 5 日至 12 日:版本 3.19May 5-12, 2020: Version 3.19

现可使用 (Z) 键盘快捷方式或选择“编辑”>“撤消删除单元格”来还原已删除的单元格。You can now restore deleted cells either by using the (Z) keyboard shortcut or by selecting Edit > Undo Delete Cells.

作业挂起队列限制Jobs pending queue limit

2020 年 5 月 5 日至 12 日:版本 3.19May 5-12, 2020: Version 3.19

工作区现在限制为 1000 个活动(正在运行和挂起的)作业运行。A workspace is now limited to 1000 active (running and pending) job runs. 由于工作区限制为 150 个并发(正在运行)作业运行,因此一个工作区的挂起队列中最多可以有 850 个运行。Since a workspace is limited to 150 concurrent (running) job runs, a workspace can have up to 850 runs in the pending queue.