2019 年 10 月October 2019

这些功能和 Azure Databricks 平台的改进已于 2019 年 10 月发布。These features and Azure Databricks platform improvements were released in October 2019.

备注

发布分阶段进行。Releases are staged. 在初始发布日期后,可能最长需要等待一周,你的 Azure Databricks 帐户才会更新。Your Azure Databricks account may not be updated until up to a week after the initial release date.

已移到 Azure 事件中心的可支持性指标Supportability metrics moved to Azure Event Hubs

2019 年 10 月 22-29 日October 22-29, 2019

用于支持 Azure Databricks 监视群集运行状况的可支持性指标已从 Azure Blob 存储迁移到事件中心终结点。The supportability metrics that enable Azure Databricks to monitor cluster health have been migrated from Azure Blob storage to Event Hub endpoints. 这使 Azure Databricks 可以通过较低的延迟响应来解决客户事件。This enables Azure Databricks to provide lower latency responses to resolve customer incidents. 对于 VNet 注入工作区,我们已将其他规则添加到 EventHub 服务终结点的网络安全组。For VNet injection workspaces, we have added an additional rule to the network security group for the EventHub service endpoint. 网络安全组规则表中提供了详细信息。Details are available in the Network security group rules table. 如果要继续使用服务,不需要执行任何操作。There is no action required for continued availability of services.

有关按区域列出的 Azure Databricks 可支持性指标事件中心终结点的列表,请参阅元存储、项目 Blob 存储、日志 Blob 存储和事件中心终结点 IP 地址For a list of the Azure Databricks supportability metrics Event Hubs endpoints by region, see Metastore, artifact Blob storage, log Blob storage, and Event Hub endpoint IP addresses.

适用于标准群集和 Scala 的 Azure Data Lake Storage 凭据直通身份验证已正式发布Azure Data Lake Storage credential passthrough on standard clusters and Scala is GA

2019 年 10 月 22 - 29 日:版本 3.5October 22 - 29, 2019: Version 3.5

适用于运行 Databricks Runtime 5.5 及以上版本的标准群集上的 Python、SQL 和 Scala 的凭证直通身份验证,以及 SparkR on Databricks Runtime 6.0 及以上版本均已正式发布。Credential passthrough for Python, SQL, and Scala on standard clusters running Databricks Runtime 5.5 and above, as well as SparkR on Databricks Runtime 6.0 and above is generally available. 请参阅为标准群集启用 Azure Data Lake Storage 凭据直通身份验证See Enable Azure Data Lake Storage credential passthrough for a standard cluster.

用于基因组学的 Databricks Runtime 6.1 正式版Databricks Runtime 6.1 for Genomics GA

2019 年 10 月 22 日October 22, 2019

用于基因组学的 Databricks Runtime 6.1 已正式发布。Databricks Runtime 6.1 for Genomics is generally available. 请参阅用于基因组学的 Databricks RuntimeSee Databricks Runtime for Genomics.

用于机器学习的 Databricks Runtime 6.1 正式版Databricks Runtime 6.1 for Machine Learning GA

2019 年 10 月 22 日October 22, 2019

Databricks Runtime 6.1 ML 已正式发布。Databricks Runtime 6.1 ML is generally available. 它包括对 GPU 群集的支持和以下机器学习库的升级:It includes support for GPU clusters and upgrades to the following machine learning libraries:

  • TensorFlow 到 1.14.0TensorFlow to 1.14.0
  • PyTorch 到 1.2.0PyTorch to 1.2.0
  • Torchvision 到 0.4.0Torchvision to 0.4.0
  • MLflow 到 1.3.0MLflow to 1.3.0

有关详细信息,请参阅完整的 Databricks Runtime 6.1 ML(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.1 ML (Unsupported) release notes.

MLflow API 调用现在受速率限制MLflow API calls are now rate limited

2019 年 10 月 22 - 29 日:版本 3.5October 22 - 29, 2019: Version 3.5

为了确保在负载较高的情况下也能提供高质量的服务,Azure Databricks 现在正针对所有 MLflow API 调用强制实施 API 速率限制。To ensure high quality of service under heavy load, Azure Databricks now enforces API rate limits for all MLflow API calls. 限制是按帐户设置的,以确保共享工作区的所有组织具有公平的使用量和高可用性。The limits are set per account to ensure fair usage and high availability for all organizations sharing a workspace.

具有自动重试的 MLflow 客户端在 MLflow 1.3.0 和 Databricks Runtime 6.1 ML(不受支持)中提供。The MLflow clients with automatic retries are available in MLflow 1.3.0 and are in Databricks Runtime 6.1 ML (Unsupported). 建议所有客户切换到最新的 MLflow 客户端版本。We advise all customers to switch to the latest MLflow client version.

有关详细信息,请参阅 MLflow APIFor details, see MLflow API.

用于快速启动群集的实例池已正式发布Pools of instances for quick cluster launch generally available

2019 年 10 月 22 - 29 日:版本 3.5October 22 - 29, 2019: Version 3.5

支持将群集附加到空闲实例的预定义池中的 Azure Databricks 功能现已正式发布。The Azure Databricks feature that supports attaching a cluster to a predefined pool of idle instances is now generally available.

当实例在池中处于空闲状态时,Azure Databricks 不会收取 DBU 费用,Azure Databricks does not charge DBUs while instances are idle in the pool. 但这会产生实例提供程序费用,具体请参阅定价Instance provider billing does apply; see pricing.

有关详细信息,请参阅For details, see Pools.

Databricks Runtime 6.1 正式版Databricks Runtime 6.1 GA

2019 年 10 月 16 日October 16, 2019

Databricks Runtime 6.1 为 Delta Lake 引入了几项增强功能:Databricks Runtime 6.1 brings several enhancements to Delta Lake:

  • 轻松将表转换为 Delta Lake 格式Easily convert tables to Delta Lake format
  • 适用于 Delta 表的 Python API(公共预览版)Python APIs for Delta tables (Public Preview)
  • 默认情况下启用动态文件修剪 (DFP)Dynamic File Pruning (DFP) enabled by default

Databricks Runtime 6.1 还消除了凭据直通身份验证中的几个限制。Databricks Runtime 6.1 also removes several limitations in credential passthrough.

备注

从6.1 版本开始,Databricks Runtime 仅支持 CPU 群集。Starting with the 6.1 release, Databricks Runtime supports only CPU clusters. 如果要使用 GPU 群集,必须使用 Databricks Runtime ML。If you want to use GPU clusters, you must use Databricks Runtime ML.

有关详细信息,请参阅完整的 Databricks Runtime 6.1(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.1 (Unsupported) release notes.

用于基因组学的 Databricks Runtime 6.0 正式版Databricks Runtime 6.0 for Genomics GA

2019 年 10 月 16 日October 16, 2019

用于基因组学的 Databricks Runtime(Databricks Runtime 基因组学)是为处理基因组和生物医学数据而优化的 Databricks Runtime 变体。Databricks Runtime for Genomics (Databricks Runtime Genomics) is a variant of Databricks Runtime optimized for working with genomic and biomedical data. 从版本 6.0 开始,用于基因组学的 Databricks Runtime 现已正式发布。Beginning with release 6.0, Databricks Runtime for Genomics is generally available.

将 Azure Databricks 工作区部署到自己虚拟网络(也称为 VNet 注入)的功能已推出正式版Ability to deploy a Azure Databricks workspace to your own virtual network, also known as VNet injection, is GA

2019 年 10 月 9 日October 9, 2019

我们很高兴地宣布正式推出将 Azure Databricks 工作区部署到你自己的虚拟网络的功能(也称为 VNet 注入)。We are very pleased to announce the GA of the ability to deploy an Azure Databricks workspace to your own virtual network, also known as VNet injection. 此选项适用于具有以下要求的用户:需要网络自定义,因此不希望使用以标准方式部署 Azure Databricks 工作区时创建的默认 VNet。This option is intended for those of you who require network customization and therefore don’t want to use the default VNet that is created when you deploy an Azure Databricks workspace in the standard manner. 利用 VNet 注入,可以:With VNet injection, you can:

通过将 Azure Databricks 部署到自己的虚拟网络中,还可以利用灵活的 CIDR 范围(虚拟网络的 CIDR 范围在 /16-/24 之间,子网最高可达 /26)。Deploying Azure Databricks to your own virtual network also lets you take advantage of flexible CIDR ranges (anywhere between /16-/24 for the virtual network and up to /26 for the subnets).

使用 Azure 门户 UI 进行配置非常快捷:在创建工作区时,只需选择“在虚拟网络中部署 Azure Databricks 工作区”,然后选择虚拟网络,并提供两个子网的 CIDR 范围。Configuration using the Azure portal UI is quick and easy: when you create a workspace, just select Deploy Azure Databricks workspace in your Virtual Network, select your virtual network, and provide CIDR ranges for two subnets. Azure Databricks 使用两个新的子网和网络安全组更新虚拟网络,将入站和出站子网流量列入允许列表,并将工作区部署到更新的虚拟网络。Azure Databricks updates the virtual network with the two new subnets and network security groups, whitelists inbound and outbound subnet traffic, and deploys the workspace to the updated virtual network.

工作区部署中的 VNet 注入VNet injection at workspace deployment

如果希望自行配置用于 VNet 注入的虚拟网络(例如,你想要使用现有子网、使用现有网络安全组,或者创建自己的安全规则),则可以使用 Azure-Databricks 提供的 ARM 模板(而不是门户 UI)。If you prefer to configure the virtual network for VNet injection yourself—for example, you want to use existing subnets, use existing network security groups, or create your own security rules—you can use Azure-Databricks-supplied ARM templates instead of the portal UI.

备注

如果你使用过 VNet 注入预览版,则必须在 2020 年 1 月 31 日前将预览版工作区升级到正式版本,才能继续获取支持。If you participated in the VNet injection preview, you must upgrade your preview workspace to the GA version before January 31, 2020 to continue to receive support. 请参阅将“VNet 注入”预览版工作区升级到正式发行版See Upgrade your VNet Injection preview workspace to GA.

有关详细信息,请参阅在 Azure 虚拟网络中部署 Azure Databricks(VNet 注入)将 Azure Databricks 工作区连接到本地网络For details, see Deploy Azure Databricks in your Azure virtual network (VNet injection) and Connect your Azure Databricks Workspace to your on-premises network.

非管理员 Azure Databricks 用户可以使用 SCIM API 读取用户和组的名称与 IDNon-admin Azure Databricks users can read user and group names and IDs using SCIM API

2019 年 10 月 8 - 15 日:版本 3.4October 8 - 15, 2019: Version 3.4

非管理员用户现在可以调用 SCIM API 获取用户和获取组终结点,以仅读取用户和组的显示名称和 ID。Non-admin users can now invoke the SCIM API Get Users and Get Groups endpoints to read user and group display names and IDs only. 所有其他 SCIM API 操作仍需要管理员访问权限。All other SCIM API operations continue to require administrator access.

工作区 API 返回笔记本和文件夹对象 IDWorkspace API returns notebook and folder object IDs

2019 年 10 月 8 - 15 日:版本 3.4October 8 - 15, 2019: Version 3.4

工作区 APIget-statuslist 终结点现在返回笔记本和文件夹对象 ID,使你能够在其他 API 调用中引用这些对象。The get-status and list endpoints of the Workspace API now return notebook and folder object IDs, giving you the ability to reference those objects in other API calls.

Databricks Runtime 6.0 ML 正式版Databricks Runtime 6.0 ML GA

2019 年 10 月 4 日October 4, 2019

Databricks Runtime 6.0 ML 包括以下更新:Databricks Runtime 6.0 ML includes the following updates:

  • MLflowMLflow
    • 新的 MLflow 试验 Spark 数据源提供了一种用于加载 MLflow 试验运行数据的标准 API。A new Spark data source for MLflow experiments now provides a standard API to load MLflow experiment run data.
    • 已添加 MLflow Java 客户端Added MLflow Java Client
    • MLflow 现已提升为顶层库MLflow is now promoted as a top-tier library
  • Hyperopt GA - 自公共预览以来的显著改进包括对 Spark 辅助角色上的 MLflow 日志记录支持、PySpark 广播变量的正确处理,以及使用 Hyperopt 选择模型的新指南。Hyperopt GA - Notable improvements since public preview include support for MLflow logging on Spark workers, correct handling of PySpark broadcast variables, as well as a new guide on model selection using Hyperopt.
  • 已升级 Horovod 和 MLflow 库以及 Anaconda 发行版。Upgraded Horovod and MLflow libraries and Anaconda distribution.

备注

此版本仅支持 CPU 群集。Only CPU clusters are supported in this release.

有关详细信息,请参阅完整的 Databricks Runtime 6.0 ML(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.0 ML (Unsupported) release notes.

新区域:巴西南部和法国中部New regions: Brazil South and France Central

2019 年 10 月 1 日October 1, 2019

Azure Databricks 现已在巴西南部(圣保罗州)和法国中部(巴黎)推出。Azure Databricks is now available in Brazil South (Sao Paolo State) and France Central (Paris).

Databricks Runtime 6.0 正式版Databricks Runtime 6.0 GA

2019 年 10 月 1 日October 1, 2019

Databricks Runtime 6.0 引入了许多库升级和新功能,其中包括:Databricks Runtime 6.0 brings many library upgrades and new features, including:

  • 用于 Delta Lake DML 命令的新 Scala 和 Java API,以及清空和历史记录实用工具命令。New Scala and Java APIs for Delta Lake DML commands, as well as the vacuum and history utility commands.
  • 增强的 DBFS FUSE 客户端,支持在模型训练过程中更快、更可靠地进行读取和写入。Enhanced DBFS FUSE client for faster and more reliable reads and writes during model training.
  • 对每个笔记本单元的多个 matplotlib 绘图的支持。Support for multiple matplotlib plots per notebook cell.
  • 更新到 Python 3.7,以及更新的 numpy、pandas、matplotlib 和其他库。Update to Python 3.7, as well as updated numpy, pandas, matplotlib, and other libraries.
  • 停用 Python 2 支持。Sunset of Python 2 support.

备注

此版本仅支持 CPU 群集。Only CPU clusters are supported in this release.

有关详细信息,请参阅完整的 Databricks Runtime 6.0(不受支持)发行说明。For more information, see the complete Databricks Runtime 6.0 (Unsupported) release notes.