群集库Cluster libraries

群集库可供群集上运行的所有笔记本使用。Cluster libraries can be used by all notebooks running on a cluster. 可以使用以前安装的工作区库或使用初始化脚本,直接从公共存储库(例如 PyPI 或 Maven)安装群集库。You can install a cluster library directly from a public repository such as PyPI or Maven, using a previously installed workspace library, or using an init script.

在群集上安装库 Install a library on a cluster

可以通过两种主要方式在群集上安装库:There are two primary ways to install a library on a cluster:

  • 安装已上传到工作区的工作区库Install a workspace library that has been already been uploaded to the workspace.
  • 安装仅与特定群集配合使用的库。Install a library for use with a specific cluster only.

此外,如果你的库需要自定义配置,则你可能无法使用上面列出的方法来安装它。In addition, if your library requires custom configuration, you may not be able to install it using the methods listed above. 但是,可以使用在创建群集时运行的初始化脚本来安装该库。Instead, you can install the library using an init script that runs at cluster creation time.

备注

在群集上安装库时,已连接到该群集的笔记本不会立即看到新库。When you install a library on a cluster, a notebook already attached to that cluster will not immediately see the new library. 必须先拆离笔记本,然后将笔记本重新附加到群集。You must first detach and then reattach the notebook to the cluster.

本节内容:In this section:

工作区库 Workspace library

备注

从 Databricks Runtime 7.2 开始,Azure Databricks 按照在群集上的安装顺序处理所有工作区库。Starting with Databricks Runtime 7.2, Azure Databricks processes all workspace libraries in the order that they were installed on the cluster. 在 Databricks Runtime 7.1 及更低版本上,Azure Databricks 按照在群集上的安装顺序处理 Maven 和 CRAN 库。On Databricks Runtime 7.1 and below, Azure Databricks processes Maven and CRAN libraries in the order they are installed on the cluster.

如果库之间存在依赖关系,则可能需要注意群集上的安装顺序。You might need to pay attention to the order of installation on the cluster if there are dependencies between libraries.

若要安装工作区中已存在的库,可以从群集 UI 或库 UI 开始:To install a library that already exists in the workspace, you can start from the cluster UI or the library UI:

群集Cluster

  1. 单击“群集”图标Click the clusters icon “群集”图标 (在边栏中)。in the sidebar.
  2. 单击群集名称。Click a cluster name.
  3. 单击 “库” 选项卡。Click the Libraries tab.
  4. 单击“新安装”。Click Install New.
  5. 在“库源”按钮列表中,选择“工作区”。In the Library Source button list, select Workspace.
  6. 选择一个工作区库。Select a workspace library.
  7. 单击“安装” 。Click Install.
  8. 若要配置要安装在所有群集上的库,请执行以下操作:To configure the library to be installed on all clusters:
    1. 单击该库。Click the library.
    2. 选中“在所有群集上自动安装”复选框。Select the Install automatically on all clusters checkbox.
    3. 单击“确认” 。Click Confirm.

Library

  1. 转到包含该库的文件夹。Go to the folder containing the library.
  2. 单击库名称。Click the library name.
  3. 执行下列操作之一:Do one of the following:
    • 若要配置要安装在所有群集上的库,请选中“在所有群集上自动安装”复选框,然后单击“确认”。To configure the library to be installed on all clusters, select the Install automatically on all clusters checkbox and click Confirm.

      重要

      此选项不会在运行 Databricks Runtime 7.0 及更高版本的群集上安装该库。This option does not install the library on clusters running Databricks Runtime 7.0 and above.

    • 选中要在其上安装该库的群集旁边的复选框,然后单击“安装”。Select the checkbox next to the cluster that you want to install the library on and click Install.

该库将安装在此群集上。The library is installed on the cluster.

群集安装的库Cluster-installed library

可以将库安装在特定群集上,而不必将其用作工作区库。You can install a library on a specific cluster without making it available as a workspace library.

若要在群集上安装库,请执行以下操作:To install a library on a cluster:

  1. 单击“群集”图标Click the clusters icon “群集”图标 (在边栏中)。in the sidebar.
  2. 单击群集名称。Click a cluster name.
  3. 单击 “库” 选项卡。Click the Libraries tab.
  4. 单击“新安装”。Click Install New.
  5. 按照创建工作区库的方法之一进行操作。Follow one of the methods for creating a workspace library. 单击“创建”后,该库将安装在群集上。After you click Create, the library is installed on the cluster.

初始化脚本Init script

如果库需要自定义配置,则可能无法使用工作区或群集库界面进行安装。If your library requires custom configuration, you may not be able to install it using the workspace or cluster library interface. 可以改用初始化脚本安装该库。Instead, you can install the library using an init script.

下面以某个初始化脚本为例,说明了如何使用 Conda 包管理器在群集初始化时在用于机器学习的 Databricks Runtime 群集上安装 Python 库。Here is an example of an init script that uses the Conda package manager to install Python libraries on a Databricks Runtime for Machine Learning cluster at cluster initialization. (Conda 只能在 Databricks Runtime ML 上使用,而不能在基本 Databricks Runtime 上使用):(Conda is available only on Databricks Runtime ML, not the base Databricks Runtime):

#!/bin/bash
set -ex
/databricks/python/bin/python -V
. /databricks/conda/etc/profile.d/conda.sh
conda activate /databricks/python
conda install -y astropy

从群集中卸载库 Uninstall a library from a cluster

备注

从群集中卸载库时,仅在重启群集时才会删除该库。When you uninstall a library from a cluster, the library is removed only when you restart the cluster. 在重启群集之前,已卸载库的状态显示为“卸载等待重启”。Until you restart the cluster, the status of the uninstalled library appears as Uninstall pending restart.

若要卸载库,可以从群集或库开始操作:To uninstall a library you can start from a cluster or a library:

群集Cluster

  1. 单击“群集”图标Click the clusters icon “群集”图标 (在边栏中)。in the sidebar.
  2. 单击群集名称。Click a cluster name.
  3. 单击 “库” 选项卡。Click the Libraries tab.
  4. 选中要从中卸载库的群集旁边的复选框,然后依次单击“卸载”、“确认”。Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. 状态将更改为“卸载等待重启”。The Status changes to Uninstall pending restart.

Library

  1. 转到包含该库的文件夹。Go to the folder containing the library.
  2. 单击库名称。Click the library name.
  3. 选中要从中卸载库的群集旁边的复选框,然后依次单击“卸载”、“确认”。Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. 状态将更改为“卸载等待重启”。The Status changes to Uninstall pending restart.
  4. 单击群集名称以转到群集详细信息页。Click the cluster name to go to the cluster detail page.

单击“重启”和“确认”以卸载该库。Click Restart and Confirm to uninstall the library. 该库将从群集的“库”选项卡中删除。The library is removed from the cluster’s Libraries tab.

查看群集上安装的库View the libraries installed on a cluster

  1. 单击“群集”图标Click the clusters icon “群集”图标 (在边栏中)。in the sidebar.
  2. 单击群集名称。Click the cluster name.
  3. 单击“库”选项卡。对于每个库,该选项卡显示名称和版本、类型、安装状态以及源文件(如果已上传)。Click the Libraries tab. For each library, the tab displays the name and version, type, install status, and, if uploaded, the source file.

更新群集安装的库Update a cluster-installed library

若要更新群集安装的库,请卸载旧版本的库,然后安装新版本。To update a cluster-installed library, uninstall the old version of the library and install a new version.