2020 年 9 月September 2020

这些功能和 Azure Databricks 平台的改进已于 2020 年 9 月发布。These features and Azure Databricks platform improvements were released in September 2020.

备注

发布分阶段进行。Releases are staged. 在初始发布日期后,可能最长需要等待一周,你的 Azure Databricks 帐户才会更新。Your Azure Databricks account may not be updated until up to a week after the initial release date.

Databricks Runtime 7.3、7.3 ML 和 7.3 基因组学现已正式发布Databricks Runtime 7.3, 7.3 ML, and 7.3 Genomics are now GA

2020 年 9 月 24 日September 24, 2020

Databricks Runtime 7.3、Databricks Runtime 7.3 ML 和用于基因组学的 Databricks Runtime 7.3 现已正式发布。Databricks Runtime 7.3, Databricks Runtime 7.3 ML, and Databricks Runtime 7.3 for Genomics are now generally available. 它们带来了许多功能和改进,包括:They bring many features and improvements, including:

  • Delta Lake 性能优化大大减少了开销Delta Lake performance optimizations significantly reduce overhead
  • 克隆指标Clone metrics
  • Delta Lake MERGE INTO 改进Delta Lake MERGE INTO improvements
  • 指定 Delta Lake 结构化流式处理的初始位置Specify the initial position for Delta Lake Structured Streaming
  • 自动加载程序改进Auto Loader improvements
  • 自适应查询执行Adaptive query execution
  • Azure Synapse Analytics 连接器列长度控制Azure Synapse Analytics connector column length control
  • 改进了 dbutils.credentials.showRoles 的行为Improved behavior of dbutils.credentials.showRoles
  • 简化了 pandas 到 Spark 数据帧的转换Simplified pandas to Spark DataFrame conversion
  • toPandas() 调用中的新 maxResultSizeNew maxResultSize in toPandas() call
  • pandas 和 PySpark UDF 的可调试性Debuggability of pandas and PySpark UDFs
  • (仅限机器学习)辅助角色上的 Conda 激活(ML only) Conda activation on workers
  • (仅限基因组学)支持读取具有未压缩或 zstd 压缩基因型的 BGEN 文件(Genomics only) Support for reading BGEN files with uncompressed or zstd-compressed genotypes
  • 库升级Library upgrades

有关详细信息,请参阅 Databricks Runtime 7.3Databricks Runtime 7.3 ML用于基因组学的 Databricks Runtime 7.3 发行说明。For more information, see the Databricks Runtime 7.3, Databricks Runtime 7.3 ML, and Databricks Runtime 7.3 for Genomics release notes.

单节点群集(公共预览版)Single Node clusters (Public Preview)

2020 年 9 月 23-29 日:版本 3.29September 23-29, 2020: Version 3.29

单节点群集是包含 Spark 驱动程序但不包含 Spark 工作器的群集。A Single Node cluster is a cluster consisting of a Spark driver and no Spark workers. 相对而言,标准模式群集至少需要一个 Spark 工作器才能运行 Spark 作业。In contrast, Standard mode clusters require at least one Spark worker to run Spark jobs. 单节点模式群集在以下情况下很有用:Single Node mode clusters are helpful in the following situations:

  • 运行需要 Spark 来加载和保存数据的单节点机器学习工作负荷Running single node machine learning workloads that need Spark to load and save data
  • 轻型探索性数据分析 (EDA)Lightweight exploratory data analysis (EDA)

有关详细信息,请参阅单节点群集For details, see Single Node clusters.

DBFS REST API 速率限制DBFS REST API rate limiting

2020 年 9 月 23-29 日:版本 3.29September 23-29, 2020: Version 3.29

为了确保在负载较高的情况下也能提供高质量的服务,Azure Databricks 现在正针对 DBFS API 调用强制实施 API 速率限制。To ensure high quality of service under heavy load, Azure Databricks is now enforcing API rate limits for DBFS API calls. 限制按工作区设置,以确保公平使用和高可用性。Limits are set per workspace to ensure fair usage and high availability. 如果使用 Databricks CLI 0.12.0 及更高版本,可以进行自动重试。Automatic retries are available using Databricks CLI version 0.12.0 and above. 建议所有客户切换到最新的 Databricks CLI 版本。We advise all customers to switch to the latest Databricks CLI version.

新边栏图标New sidebar icons

2020 年 9 月 23-29 日September 23-29, 2020

我们已更新了 Azure Databricks 工作区 UI 中的边栏。We’ve updated the sidebar in the Azure Databricks workspace UI. 不是什么大问题,但我们认为新的图标看起来非常不错。No big deal, but we think the new icons look pretty nice.

边栏sidebar

正在运行的作业限制增加Running jobs limit increase

2020 年 9 月 23-29 日:版本 3.29September 23-29, 2020: Version 3.29

每个工作区的并发运行作业运行限制从 150 增加到了 1000。The concurrent running job run limit has been increased from 150 to 1000 per workspace. 超过 150 的运行将不再处于排队挂起状态。No longer will runs over 150 be queued in the pending state. 当你请求不能立即启动的运行时,将返回 429 Too Many Requests 响应,而不是返回超出并发运行的运行请求的队列。Instead of a queue for run requests above concurrent runs, a 429 Too Many Requests response is returned when you request a run that cannot be started immediately. 这种提高限制的措施已逐步推行,现已在所有区域中的所有工作区上可用。This limit increase was rolled out gradually and is now available on all workspaces in all regions.

MLflow 中的项目访问控制列表 (ACL)Artifact access control lists (ACLs) in MLflow

2020 年 9 月 23-29 日:版本 3.29September 23-29, 2020: Version 3.29

现在,对 MLflow 跟踪中的项目实施了 MLflow试验权限,使你可以轻松控制对模型、数据集和其他文件的访问。MLflow Experiment permissions are now enforced on artifacts in MLflow Tracking, enabling you to easily control access to your models, datasets, and other files. 默认情况下,当你创建新的试验时,其运行项目现在会存储在 MLflow 托管的一个位置。By default, when you create a new experiment, its run artifacts are now stored in an MLflow-managed location. 将自动应用四个 MLflow 试验权限级别(无权限读取编辑管理)来运行 MLflow 托管位置中存储的项目,如下所述:The four MLflow Experiment permissions levels (No Permissions, Read, Edit, and Manage) automatically apply to run artifacts stored in MLflow-managed locations as follows:

  • 若要将运行项目记录到试验中,需要具有 编辑管理 权限。Edit or Manage permissions are required to log run artifacts to an experiment.
  • 若要列出和下载试验中的运行项目,需要具有 读取 权限。Read permissions are required to list and download run artifacts from an experiment.

有关详细信息,请参阅 MLflow 项目权限For more information, see MLflow Artifact permissions.

MLflow 可用性改进MLflow usability improvements

2020 年 9 月 23-29 日:版本 3.29September 23-29, 2020: Version 3.29

此版本包括以下 MLflow 可用性改进:This release includes the following MLflow usability improvements:

  • MLflow“试验”和“注册的模型”页面现在包含帮助新用户入门的技巧。The MLflow Experiment and Registered Models pages now have tips to help new users get started.
  • 现在,模型版本表显示模型版本的说明文本。The model version table now shows the description text for a model version. 一个新列显示说明的前 32 个字符或第一行(以较短者为准)。A new column shows the first 32 characters or the first line (whichever is shorter) of the description.

新的 Azure Databricks Power BI 连接器(公共预览版)New Azure Databricks Power BI Connector (Public Preview)

2020 年 9 月 22 日September 22, 2020

Power BI Desktop 版本 2.85.681.0 包括一个更新的 Azure Databricks Power BI 连接器,它使 Azure Databricks 与 Power BI 之间的集成较之以前要无缝和可靠得多。Power BI Desktop version 2.85.681.0 includes an updated Azure Databricks Power BI connector that makes the integration between Azure Databricks and Power BI far more seamless and reliable. 新连接器具有以下改进:The new connector comes with the following improvements:

  • 简单的连接配置:新的 Power BI Azure Databricks 连接器已集成到 Power BI 中,你只需要单击几下鼠标,即可使用简单的对话框对其进行配置。Simple connection configuration: the new Power BI Azure Databricks connector is integrated into Power BI, and you configure it using a simple dialog with a couple of clicks.
  • 使用 Azure Active Directory 身份验证的安全且无缝的身份验证。Secure and seamless authentication using Azure Active Directory authentication.
  • 由于新的 Azure Databricks ODBC 驱动程序提供了显著的性能改进,因此导入操作和优化的元数据调用操作速度更快。Faster imports and optimized metadata calls, thanks to the new Azure Databricks ODBC driver, which comes with significant performance improvements.
  • 通过 Power BI 访问 Azure Databricks 数据时,将遵守 Azure Databricks 表访问控制以及与你的 Azure AD 标识关联的 Azure 存储帐户权限。Access to Azure Databricks data through Power BI respects Azure Databricks table access control and Azure storage account permissions associated with your Azure AD identity.

有关详细信息,请参阅 Power BIFor more information, see Power BI.

为 DBFS 根使用客户管理的密钥(公共预览版)Use customer-managed keys for DBFS root (Public Preview)

2020 年 9 月 15 日September 15, 2020

现在,你可以在 Azure Key Vault 中使用自己的加密密钥来加密 DBFS 存储帐户。You can now use your own encryption key in Azure Key Vault to encrypt the DBFS storage account. 请参阅为 DBFS 根配置客户管理的密钥See Configure customer-managed keys for DBFS root.

MLflow 模型服务(公共预览版)MLflow Model Serving (Public Preview)

2020 年 9 月 9-15 日:版本 3.28September 9-15, 2020: Version 3.28

MLflow 模型服务目前以公共预览版提供。MLflow Model Serving is now available in Public Preview. 使用 MLflow 模型服务,可以将在模型注册表中注册的 MLflow 模型部署为由 Azure Databricks 承载和管理的 REST API 终结点。MLflow Model Serving allows you to deploy a MLflow model registered in Model Registry as a REST API endpoint hosted and managed by Azure Databricks. 为已注册的模型启用模型服务时,Azure Databricks 将创建一个群集并部署该模型的所有非存档版本。When you enable model serving for a registered model, Azure Databricks creates a cluster and deploys all non-archived versions of that model.

可以使用标准 Azure Databricks 身份验证通过 REST API 请求来查询所有模型版本。You can query all model versions by REST API requests with standard Azure Databricks authentication. 模型访问权限从模型注册表继承而来,对已注册的模型拥有读取权限的任何人都可以查询任何已部署的模型版本。Model access rights are inherited from the Model Registry — anyone with read rights for a registered model can query any of the deployed model versions. 当此服务为预览版时,建议将其用于低吞吐量和非关键应用程序。While this service is in preview, we recommend its use for low throughput and non-critical applications.

有关详细信息,请参阅 Azure Databricks 上的 MLflow 模型服务For more information, see MLflow Model Serving on Azure Databricks.

群集 UI 改进Clusters UI improvements

2020 年 9 月 9-15 日:版本 3.28September 9-15, 2020: Version 3.28

“群集”页面现在针对“通用群集”和“作业群集”提供了单独的选项卡。 The Clusters page now has separate tabs for All-Purpose Clusters and Job Clusters. 现在,每个选项卡上的列表已分页。The list on each tab is now paginated. 此外,我们还修复了有时候在创建群集后因为延迟而不能在 UI 中及时看到该群集的问题。In addition, we have fixed the delay that sometimes occurred between creating a cluster and being able to see it in the UI.

针对作业、群集、笔记本和其他工作区对象的可见性控件Visibility controls for jobs, clusters, notebooks, and other workspace objects

2020 年 9 月 9-15 日:版本 3.28September 9-15, 2020: Version 3.28

默认情况下,任何用户都可以查看 Azure Databricks UI 中显示的其工作区中的所有作业、群集、笔记本和文件夹,可以使用 Databricks API 列出它们,即使为这些对象启用了访问控制并且用户无权访问这些对象。By default, any user can see all jobs, clusters, notebooks, and folders in their workspace displayed in the Azure Databricks UI and can list them using the Databricks API, even when access control is enabled for those objects and a user has no permissions on those objects.

现在,任何 Azure Databricks 管理员都可以为笔记本和文件夹(工作区对象)、群集和作业启用可见性控制,以确保用户只能查看该管理员通过工作区、群集或作业访问控制为用户授予了访问权限的那些对象。Now any Azure Databricks admin can enable visibility controls for notebooks and folders (workspace objects), clusters, and jobs to ensure that users can view only those objects that they have been given access to through workspace, cluster, or jobs access control.

请参阅:See:

默认情况下不再允许使用令牌创建功能Ability to create tokens no longer permitted by default

2020 年 9 月 9-15 日:版本 3.28September 9-15, 2020: Version 3.28

对于在发布 Azure Databricks 平台 3.28 版之后创建的工作区,默认情况下,用户将不再能够生成个人访问令牌。For workspaces created after the release of Azure Databricks platform version 3.28, users will no longer have the ability to generate personal access tokens by default. 管理员必须显式授予这些权限,无论是向整个 users 组授予还是按用户或按组授予。Admins must explicitly grant those permissions, whether to the entire users group or on a user-by-user or group-by-group basis. 在 3.28 发布之前创建的工作区会保留已有的权限。Workspaces created before 3.28 was released will maintain the permissions that were already in place.

请参阅管理个人访问令牌See Manage personal access tokens.

MLflow 模型注册表支持在工作区之间共享模型MLflow Model Registry supports sharing of models across workspaces

2020 年 9 月 9 日September 9, 2020

Azure Databricks 现在支持从多个工作区访问模型注册表。Azure Databricks now supports access to the model registry from multiple workspaces. 你现在可以跨工作区注册模型、跟踪模型运行以及加载模型。You can now register models, track model runs, and load models across workspaces. 现在,多个团队可以共享对模型的访问权限,组织可以使用多个工作区来处理不同的开发阶段。Multiple teams can now share access to models, and organizations can use multiple workspaces to handle the different stages of development. 有关详细信息,请参阅跨工作区共享模型For details, see Share models across workspaces.

此功能需要 MLflow Python 客户端版本 1.11.0 或更高版本。This functionality requires MLflow Python client version 1.11.0 or above.

Databricks Runtime 7.3(Beta 版本)Databricks Runtime 7.3 (Beta)

2020 年 9 月 3 日September 3, 2020

Databricks Runtime 7.3、Databricks Runtime 7.3 ML 和用于基因组学的 Databricks Runtime 7.3 现已作为 Beta 版本发布。Databricks Runtime 7.3, Databricks Runtime 7.3 ML, and Databricks Runtime 7.3 for Genomics are now available as Beta releases.

有关信息,请参阅 Databricks Runtime 7.3Databricks Runtime 7.3 ML用于基因组学的 Databricks Runtime 7.3 发行说明。For information, see the Databricks Runtime 7.3, Databricks Runtime 7.3 ML, and Databricks Runtime 7.3 for Genomics release notes.

Azure Databricks 工作负载类型名称变更Azure Databricks workload type name change

2020 年 9 月 1 日September 1, 2020

群集使用的工作负荷类型的名称已更改:The names of the workload types used by your clusters have been changed:

  • 数据工程 -> 作业计算Data Engineering -> Jobs Compute
  • 轻量数据工程 -> 作业轻量计算Data Engineering Light -> Jobs Light Compute
  • 数据分析 -> 通用计算Data Analytics -> All-purpose Compute

这些新名称将在发票和 EA 门户中与定价计划一起显示(例如,“高级 - 作业计算 - DBU”)。These new names will appear on invoices and in the EA portal in combination with your pricing plan (for example, “Premium - Jobs Compute - DBU”). 有关详细信息,请参阅 Azure Databricks 计量For details, see Azure Databricks Meters.

在平台版本 3.27 中,用户界面也发生了更改(针对 8 月 25 日到 9 月 3 日之间的过渡版):The user interface has also changed in platform version 3.27 (targeted for staged release between Aug 25 - Sept 3):

“群集”页上,列表标题已更改:On the Clusters page, the list headings have changed:

  • 交互式群集 -> 通用群集Interactive Clusters -> All-Purpose Clusters
  • 自动化群集 -> 作业群集Automated Clusters -> Job Clusters

为作业配置群集时,群集类型选项已更改:When you configure a cluster for a job, the Cluster Type options have changed:

  • 新建自动化群集 -> 新建作业群集New Automated Cluster -> New Job Cluster
  • 现有交互式群集 -> 现有通用群集Existing Interactive Cluster -> Existing All-Purpose Cluster