2018 年 5 月May 2018

发布分阶段进行。Releases are staged. Azure Databricks 帐户可能要等到初始发布日期后的一周内才会更新。Your Azure Databricks account may not be updated until a week after the initial release date.

一般数据保护条例 (GDPR)General Data Protection Regulation (GDPR)

2018 年 5 月 24 日:版本 2.72May 24, 2018: Version 2.72

为了满足 2018 年 5 月 25 日生效的欧盟一般数据保护条例 (GDPR) 的要求,我们对 Azure Databricks 平台进行了多次修改,以便用户更好地控制帐户和用户级别的数据保留。To meet the requirements of the European Union General Data Protection Regulation (GDPR), which goes into effect on May 25, 2018, we have made a number of modifications to the Azure Databricks platform to provide you with more control of data retention at both the account and user level. 更新包括:Updates include:

  • 群集删除:使用 UI 或群集 API 永久删除群集配置。Cluster delete: permanently delete a cluster configuration using the UI or the Clusters API. 请参阅删除群集See Delete a cluster.
  • 工作区清除(在版本 2.71 中发布):永久删除工作区对象,如整个笔记本、单个笔记本单元格、单个笔记本注释和笔记本修订历史记录。Workspace purge (released in version 2.71): permanently delete workspace objects, such as entire notebooks, individual notebook cells, individual notebook comments, and notebook revision history. 请参阅管理工作区存储See Manage workspace storage.
  • 笔记本修订历史记录清除:Notebook revision history purge:
    • 在定义的时间范围内永久删除工作区中所有笔记本的修订历史记录。Permanently delete the revision history of all notebooks in a workspace for a defined time frame. 请参阅管理工作区存储See Manage workspace storage.
    • 永久删除单个笔记本修订或笔记本的整个修订历史记录。Permanently delete a single notebook revision or the entire revision history of a notebook. 请参阅版本控制See Version control.

有关删除 Azure Databricks 服务或取消 Azure 帐户的信息,请参阅管理订阅For information about deleting your Azure Databricks service or canceling your Azure account, see Manage your subscription.

Azure Databricks 用户必须属于 Azure AD 租户Azure Databricks users must belong to Azure AD tenant

2018 年 5 月 24 日:版本 2.72May 24, 2018: Version 2.72

现在,仅当用户属于 Azure Databricks 工作区的 Azure Active Directory (Azure AD) 租户时,才能登录到 Azure Databricks。Users can now sign in to Azure Databricks only if they belong to the Azure Active Directory (Azure AD) tenant of the Azure Databricks workspace. 如果用户不属于 Azure AD 租户,可以将他们添加为标准或来宾用户If you have users who do not belong to the Azure AD tenant, you can add them as standard or guest users.

HorovodEstimatorHorovodEstimator

2018 年 5 月 29 日:版本 2.72May 29, 2018: Version 2.72

添加了 HorovodEstimator 的文档和笔记本,HorovodEstimator 是一种 MLlib 样式的评估器 API,它利用 Uber 的 Horovod 框架。Added documentation and a notebook for HorovodEstimator, an MLlib-style estimator API that leverages Uber’s Horovod framework. HorovodEstimator 可以辅助 Spark DataFrame 上深度神经网络的分布式、多 GPU 训练,从而简化 Spark 中的 ETL 与 TensorFlow 中的模型训练的集成。HorovodEstimator facilitates distributed, multi-GPU training of deep neural networks on Spark DataFrames, simplifying the integration of ETL in Spark with model training in TensorFlow. 请参阅 HorovodEstimator:使用 Horovod 和 Apache Spark MLlib 进行分布式深度学习See HorovodEstimator: distributed deep learning with Horovod and Apache Spark MLlib.

MLeap ML 模型导出MLeap ML Model Export

2018 年 5 月 24 日:版本 2.72May 24, 2018: Version 2.72

添加了有关在 Azure Databricks 上使用 MLeap 的文档和笔记本。Added documentation and notebooks on using MLeap on Azure Databricks. 通过 MLeap,可以将机器学习管道从 Apache Spark 和 cikit-learn 部署到可移植格式和执行引擎。MLeap allows you to deploy machine learning pipelines from Apache Spark and scikit-learn to a portable format and execution engine. 请参阅 MLeap ML 模型导出See MLeap ML Model Export.

更多的 GPU 群集类型Even more GPU cluster types

2018 年 5 月 24 日:版本 2.72May 24, 2018: Version 2.72

除了在版本 2.71 中添加的 Azure NC 实例类型(NC12 和 NC24),我们现在还支持 Azure Databricks 群集上的 NCv3 实例类型系列(NC6s_v3、NC12s_v3 和 NC24s_v3) 。In addition to the Azure NC instance types (NC12 and NC24) that we added in Release 2.71, we now support the NCv3 instance type series (NC6s_v3, NC12s_v3, and NC24s_v3) on Azure Databricks clusters. NC 和 NCv3 实例提供 GPU,以支持图像处理、文本分析和其他机器学习和深度学习任务,这些任务在计算上具有挑战性,需要卓越的性能。NC and NCv3 instances provide GPUs to power image processing, text analysis, and other machine learning and deep learning tasks that are computationally challenging and demand superior performance.

请参阅启用了 GPU 的群集See GPU-enabled clusters.

笔记本单元:隐藏和显示Notebook cells: hide and show

2018 年 5 月 24 日:版本 2.72May 24, 2018: Version 2.72

新的指示器和消息使隐藏后的笔记本单元格内容更易于显示。New indicators and messaging make it easier to show Notebook cell contents after they’ve been hidden. 请参阅隐藏和显示单元格内容See Hide and show cell content.

2018 年 5 月 22 日May 22, 2018

我们已经用更好的搜索工具替换了文档站点搜索。We have replaced our doc site search with a better search tool. 在接下来的几周内,你将会看到更多的搜索改进。You’ll see even more search improvements over the coming weeks.

备注

如果在新搜索部署后不久尝试搜索,搜索可能看起来是损坏的。Search may look broken if you try it shortly after the new search is deployed. 只需清除浏览器缓存,即可看到新的搜索体验。Just clear your browser cache to see the new search experience.

用于机器学习的 Databricks Runtime 4.1 ML (Beta)Databricks Runtime 4.1 ML for Machine Learning (Beta)

2018 年 5 月 17 日May 17, 2018

Databricks Runtime ML (Beta) 为机器学习和数据科学提供了随时可用的环境。Databricks Runtime ML (Beta) provides a ready-to-go environment for machine learning and data science. 它包含多个热门库,其中包括 TensorFlow、Keras 和 XGBoost。It contains multiple popular libraries, including TensorFlow, Keras, and XGBoost.

通过 Databricks Runtime ML,可启动具有分布式 TensorFlow 训练所需的所有库的 Databricks 群集。Databricks Runtime ML lets you start a Databricks cluster with all of the libraries required for distributed TensorFlow training. 它可确保群集中包含的库的兼容性(例如 TensorFlow 和 CUDA / cuDNN 之间的兼容性),并且与使用 init 脚本相比,显著缩短了群集启动时间。It ensures the compatibility of the libraries included on the cluster (between TensorFlow and CUDA / cuDNN, for example) and substantially decreases the cluster start-up time compared to using init scripts.

备注

Databricks Runtime 4.1 ML 仅适用于高级 SKU。Databricks Runtime 4.1 ML is available only in the Premium SKU.

请参阅 Databricks Runtime 4.1 ML (Beta) 的完整发行说明。See the complete release notes for Databricks Runtime 4.1 ML (Beta).

Databricks DeltaDatabricks Delta

2018 年 5 月 17 日May 17, 2018

Databricks Delta 现在面向 Azure Databricks 用户推出个人预览版。Databricks Delta is now available in Private Preview to Azure Databricks users. 请联系你的帐户管理员或在 https://databricks.com/product/databricks-delta 进行注册。Contact your account manager or sign up at https://databricks.com/product/databricks-delta. 此版本表示即将正式发布的正式版的候选版本。This release represents a candidate release in anticipation of the upcoming GA release.

有关详细信息,请参阅 Databricks Runtime 4.1Delta LakeFor more information, see Databricks Runtime 4.1 and Delta Lake.

图像数据类型的 Display() 支持Display() support for image data types

2018 年 5 月 17 日May 17, 2018

在 Databricks Runtime 4.1 中,display() 现以富 HTML 的形式呈现包含图像数据类型的列。In Databricks Runtime 4.1, display() now renders columns containing image data types as rich HTML.

请参阅图像See Images.

GPU 群集类型GPU cluster types

2018 年 5 月 15 日:版本 2.71May 15, 2018: Version 2.71

我们很高兴地宣布推出对 Azure Databricks 群集上的 Azure NC 实例类型(NC12 和 NC24)的支持。We’re pleased to announce support for Azure NC instance types (NC12 and NC24) on Azure Databricks clusters. NC 实例提供 GPU,以支持图像处理、文本分析和其他机器学习和深度学习任务,这些任务在计算上具有挑战性,需要卓越的性能。NC instances provide GPUs to power image processing, text analysis, and other machine learning and deep learning tasks that are computationally challenging and demand superior performance.

Azure Databricks 还提供为 GPU 配置的预安装 NVIDIA 驱动程序和库,以及有助于开始使用若干热门的深度学习库的材料。Azure Databricks also provides pre-installed NVIDIA drivers and libraries configured for GPUs, along with material for getting started with several popular deep learning libraries.

另请参阅:See also:

机密管理正式版Secret management GA

2018 年 5 月 15 日:版本 2.71May 15, 2018: Version 2.71

以前处于个人预览阶段的机密管理现已正式发布。Secret management, which had been in private preview, is now GA. 它提供了功能强大的工具,用于管理验证外部数据源所需的凭据。It provides powerful tools for managing the credentials you need for authenticating to external data sources. 使用 Databricks 机密管理在笔记本和作业中存储和引用凭据,而不是直接在笔记本中键入凭据。Instead of typing your credentials directly into a notebook, use Databricks secret management to store and reference your credentials in notebooks and jobs. 若要管理机密,可以使用机密 CLI 访问机密 APITo manage secrets, you can use the Secrets CLI to access the Secrets API.

备注

机密管理要求安装 Databricks Runtime 4.0 或更高版本,以及 Databricks CLI 0.7.1 或更高版本。Secret management requires Databricks Runtime 4.0 or above and Databricks CLI 0.7.1 or above.

请参阅机密管理See Secret management.

机密 API 终结点和 CLI 命令更改Secrets API endpoint and CLI command changes

2018 年 5 月 15 日:版本 2.71May 15, 2018: Version 2.71

对机密 API 终结点进行了以下更改:The following changes were made to the Secrets API endpoints:

  • 对于所有终结点,根路径已从 /secret 更改为 /secretsFor all endpoints, the root path was changed from /secret to /secrets.
  • 对于机密终结点,/secret/secrets 已折叠为 /secrets/For the secrets endpoint, the /secret/secrets was collapsed to /secrets/.
  • write 方法已更改为 putThe write method was changed to put.

Databricks CLI 0.7.1 包括对机密命令的更新,以与这些更新的 API 终结点保持一致。Databricks CLI 0.7.1 includes updates to Secrets commands to align with these updated API endpoints.

请参阅机密 API机密管理See Secrets API and Secret management.

群集固定Cluster pinning

2018 年 5 月 15 日:版本 2.71May 15, 2018: Version 2.71

现在可以将群集固定到群集列表。You can now pin a cluster to the Clusters list. 这样你能保留已终止 30 天以上的群集的配置。This lets you retain the configuration of clusters terminated over 30 days old.

固定群集Pin cluster

此外,“群集”页现在显示 30 天(以前是 7 天)内终止的所有群集。In addition, the Clusters page now displays all clusters that were terminated within 30 days (increased from 7 days).

请参阅固定群集See Pin a cluster.

群集自动启动Cluster autostart

2018 年 5 月 15 日:版本 2.71May 15, 2018: Version 2.71

在此版本之前,计划在 Terminated 群集上运行的作业失败。Before this release, jobs scheduled to run on Terminated clusters failed. 对于在 Azure Databricks 版本 2.71 及以上版本中创建的群集,来自 JDBC/ODBC 接口的命令或分配给现有终止群集的作业运行会自动重启该群集。For clusters created in Azure Databricks version 2.71 and above, commands from a JDBC/ODBC interface or a job run assigned to an existing terminated cluster automatically restarts that cluster. 请参阅 JDBC 连接创建作业See JDBC connect and Create a job.

通过自动启动,可以将群集配置为自动终止,而无需手动干预来为计划的作业重启群集。Autostart allows you to configure clusters to autoterminate, without requiring manual intervention to restart the clusters for scheduled jobs. 此外,还可以通过计划在指定的时间重启已终止的群集的作业来计划群集初始化。Furthermore, you can schedule cluster initialization by scheduling a job that restarts terminated clusters at a specified time.

强制实施群集访问控制,并照常检查作业所有者权限。Cluster access control is enforced and job owner permissions are checked as usual.

工作区清除Workspace purging

2018 年 5 月 15 日:版本 2.71May 15, 2018: Version 2.71

在始终努力遵循欧盟一般数据保护条例 (GDPR) 的过程中,我们添加了清除工作区对象(如整个笔记本、单个笔记本单元格、单个笔记本注释和笔记本修订历史记录)的功能。As part of our ongoing effort to comply with the European Union General Data Protection Regulation (GDPR), we have added the ability to purge workspace objects, such as entire notebooks, individual notebook cells, individual notebook comments, and notebook revision history. 几周后,我们将发布更多功能和文档,以支持 GDPR 符合性。We will release more functionality and documentation to support GDPR compliance in the coming weeks.

请参阅管理工作区存储See Manage workspace storage.

Databricks CLI 0.7.1Databricks CLI 0.7.1

2018 年 5 月 10 日May 10, 2018

Databricks CLI 0.7.1 包括对“机密”命令的更新,以与更新的 API 终结点保持一致。Databricks CLI 0.7.1 includes updates to Secrets commands to align with updated API endpoints.

请参阅 Databricks CLI机密管理See Databricks CLI and Secret management.