库 APILibraries API

使用库 API 可以安装和卸载库,并获取群集上库的状态。The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster.

重要

要访问 Databricks REST API,必须进行身份验证To access Databricks REST APIs, you must authenticate.

所有群集状态 All cluster statuses

端点Endpoint HTTP 方法HTTP Method
2.0/libraries/all-cluster-statuses GET

获取所有群集上所有库的状态。Get the status of all libraries on all clusters. 对于通过 API 或库 UI 安装在群集上的所有库,以及通过库 UI 设置为在所有群集上安装的库,可获得状态。A status will be available for all libraries installed on clusters via the API or the libraries UI as well as libraries set to be installed on all clusters via the libraries UI. 如果已将库设置为安装在所有群集上,则 is_library_for_all_clusters 将为 true,即使该库还安装在此特定群集上也是如此。If a library has been set to be installed on all clusters, is_library_for_all_clusters will be true, even if the library was also installed on this specific cluster.

示例响应Example response

{
  "statuses": [
    {
      "cluster_id": "11203-my-cluster",
      "library_statuses": [
        {
          "library": {
            "jar": "dbfs:/mnt/libraries/library.jar"
          },
          "status": "INSTALLING",
          "messages": [],
          "is_library_for_all_clusters": false
        }
      ]
    },
    {
      "cluster_id": "20131-my-other-cluster",
      "library_statuses": [
        {
          "library": {
            "egg": "dbfs:/mnt/libraries/library.egg"
          },
          "status": "ERROR",
          "messages": ["Could not download library"],
          "is_library_for_all_clusters": false
        }
      ]
    }
  ]
}

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
statusesstatuses 一个 ClusterLibraryStatuses 数组。An array of ClusterLibraryStatuses 群集状态的列表。A list of cluster statuses.

群集状态 Cluster status

端点Endpoint HTTP 方法HTTP Method
2.0/libraries/cluster-status GET

获取某个群集上库的状态。Get the status of libraries on a cluster. 对于通过 API 或库 UI 安装在此群集上的所有库,以及通过库 UI 设置为在所有群集上安装的库,可获得状态。A status will be available for all libraries installed on the cluster via the API or the libraries UI as well as libraries set to be installed on all clusters via the libraries UI. 如果已将库设置为安装在所有群集上,则 is_library_for_all_clusters 将为 true,即使该库还安装在此群集上也是如此。If a library has been set to be installed on all clusters, is_library_for_all_clusters will be true, even if the library was also installed on the cluster.

示例请求Example request

/libraries/cluster-status?cluster_id=11203-my-cluster

示例响应Example response

{
  "cluster_id": "11203-my-cluster",
  "library_statuses": [
    {
      "library": {
        "jar": "dbfs:/mnt/libraries/library.jar"
      },
      "status": "INSTALLED",
      "messages": [],
      "is_library_for_all_clusters": false
    },
    {
      "library": {
        "pypi": {
          "package": "beautifulsoup4"
        },
      },
      "status": "INSTALLING",
      "messages": ["Successfully resolved package from PyPI"],
      "is_library_for_all_clusters": false
    },
    {
      "library": {
        "cran": {
          "package": "ada",
          "repo": "https://cran.us.r-project.org"
        },
      },
      "status": "FAILED",
      "messages": ["R package installation is not supported on this spark version.\nPlease upgrade to Runtime 3.2 or higher"],
      "is_library_for_all_clusters": false
    }
  ]
}

请求结构 Request structure

字段名称Field Name 类型Type 描述Description
cluster_idcluster_id STRING 应检索其状态的群集的唯一标识符。Unique identifier of the cluster whose status should be retrieved. 此字段为必需字段。This field is required.

响应结构 Response structure

字段名称Field Name 类型Type 描述Description
cluster_idcluster_id STRING 群集的唯一标识符。Unique identifier for the cluster.
library_statuseslibrary_statuses 一个 LibraryFullStatus 数组An array of LibraryFullStatus 该群集上所有库的状态。Status of all libraries on the cluster.

安装 Install

端点Endpoint HTTP 方法HTTP Method
2.0/libraries/install POST

在群集上安装库。Install libraries on a cluster. 安装是异步的 - 它在请求出现之后在后台完成。The installation is asynchronous - it completes in the background after the request.

重要

如果群集已终止,则此调用会失败。This call will fail if the cluster is terminated.

在群集上安装 wheel 库就像直接在驱动程序和执行程序上对 wheel 文件运行 pip 命令一样。Installing a wheel library on a cluster is like running the pip command against the wheel file directly on driver and executors. 将安装库 setup.py 文件中指定的所有依赖项,这需要库名称满足 wheel 文件名约定All the dependencies specified in the library setup.py file are installed and this requires the library name to satisfy the wheel file name convention.

只有在启动新任务时,才会在执行程序上进行安装。The installation on the executors happens only when a new task is launched. 使用 Databricks Runtime 7.1 及更低版本时,库的安装顺序不确定。With Databricks Runtime 7.1 and below, the installation order of libraries is nondeterministic. 对于 wheel 库,可以通过创建后缀为 .wheelhouse.zip 的 zip 文件(包括所有 wheel 文件)来确保安装顺序确定。For wheel libraries, you can ensure a deterministic installation order by creating a zip file with suffix .wheelhouse.zip that includes all the wheel files.

示例请求Example request

{
  "cluster_id": "10201-my-cluster",
  "libraries": [
    {
      "jar": "dbfs:/mnt/libraries/library.jar"
    },
    {
      "egg": "dbfs:/mnt/libraries/library.egg"
    },
    {
      "whl": "dbfs:/mnt/libraries/mlflow-0.0.1.dev0-py2-none-any.whl"
    },
    {
      "whl": "dbfs:/mnt/libraries/wheel-libraries.wheelhouse.zip"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2",
        "exclusions": ["slf4j:slf4j"]
      }
    },
    {
      "pypi": {
        "package": "simplejson",
        "repo": "https://my-pypi-mirror.com"
      }
    },
    {
      "cran": {
        "package": "ada",
        "repo": "https://cran.us.r-project.org"
      }
    }
  ]
}

请求结构 Request structure

字段名称Field Name 类型Type 描述Description
cluster_idcluster_id STRING 要在其上安装这些库的群集的唯一标识符。Unique identifier for the cluster on which to install these libraries. 此字段为必需字段。This field is required.
libraries 一个由构成的数组An array of Library 要安装的库。The libraries to install.

卸载 Uninstall

端点Endpoint HTTP 方法HTTP Method
2.0/libraries/uninstall POST

设置要在群集上卸载的库。Set libraries to be uninstalled on a cluster. 在重启群集之前,不会卸载这些库。The libraries aren’t uninstalled until the cluster is restarted. 卸载未安装在群集上的库不会产生任何影响,不会出错。Uninstalling libraries that are not installed on the cluster has no impact but is not an error.

示例请求Example request

{
  "cluster_id": "10201-my-cluster",
  "libraries": [
    {
      "jar": "dbfs:/mnt/libraries/library.jar"
    },
    {
      "cran": "ada"
    }
  ]
}

请求结构 Request structure

字段名称Field Name 类型Type 描述Description
cluster_idcluster_id STRING 要在其上卸载这些库的群集的唯一标识符。Unique identifier for the cluster on which to uninstall these libraries. 此字段为必需字段。This field is required.
libraries 一个由构成的数组An array of Library 要卸载的库。The libraries to uninstall.

数据结构 Data structures

本节内容:In this section:

ClusterLibraryStatuses ClusterLibraryStatuses

字段名称Field Name 类型Type 描述Description
cluster_idcluster_id STRING 群集的唯一标识符。Unique identifier for the cluster.
library_statuseslibrary_statuses 一个 LibraryFullStatus 数组An array of LibraryFullStatus 该群集上所有库的状态。Status of all libraries on the cluster.

Library

字段名称Field Name 类型Type 描述Description
jar、egg、whl、pypi、maven 或 cranjar OR egg OR whl OR pypi OR maven OR cran STRINGSTRINGSTRINGPythonPyPiLibraryMavenLibraryRCranLibrarySTRING OR STRING OR STRING OR PythonPyPiLibrary OR MavenLibrary OR RCranLibrary 如果是 jar,则为要安装的 jar 的 DBFS URI。If jar, DBFS URI of the jar to be installed. 例如:{ "jar": "dbfs:/mnt/databricks/library.jar" }For example: { "jar": "dbfs:/mnt/databricks/library.jar" }.

如果是 egg,则为要安装的 egg 的 DBFS URI。If egg, DBFS URI of the egg to be installed. 例如:{ "egg": "dbfs:/my/egg" }For example: { "egg": "dbfs:/my/egg" }.

如果 whl,则为要安装的 wheel 或已压缩 wheel 的 URI。If whl, URI of the wheel or zipped wheels to be installed. 仅支持 DBFS URI。Only DBFS URIs are supported. 例如:{ "whl": "dbfs:/my/whl" }For example: { "whl": "dbfs:/my/whl" }.

wheel 文件名需要使用正确的约定The wheel file name needs to use the correct convention. 如果要安装压缩的 wheel,文件名后缀应为 .wheelhouse.zipIf zipped wheels are to be installed, the file name suffix should be .wheelhouse.zip.

如果是 pypi,则为要安装的 PyPI 库的规范。If pypi, specification of a PyPI library to be installed. 例如:For example:
{ "package": "simplejson" }

如果是 maven,则为要安装的 Maven 库的规范。If maven, specification of a Maven library to be installed. 例如:For example:
{ "coordinates": "org.jsoup:jsoup:1.7.2" }

如果是 cran,则为要安装的 CRAN 库的规范。If cran, specification of a CRAN library to be installed.

LibraryFullStatus LibraryFullStatus

特定群集上的库的状态。The status of the library on a specific cluster.

字段名称Field Name 类型Type 描述Description
library LibraryLibrary 库的唯一标识符。Unique identifier for the library.
statusstatus LibraryInstallStatusLibraryInstallStatus 在群集上安装该库的状态。Status of installing the library on the cluster.
计数messages 一个由 STRING 构成的数组An array of STRING 迄今为止出现的针对此库的所有信息和警告消息。All the info and warning messages that have occurred so far for this library.
is_library_for_all_clustersis_library_for_all_clusters BOOL 是否已通过库 UI 将该库设置为在所有群集上安装。Whether the library was set to be installed on all clusters via the libraries UI.

MavenLibrary MavenLibrary

字段名称Field Name 类型Type 描述Description
坐标coordinates STRING Gradle 样式的 Maven 坐标。Gradle-style Maven coordinates. 例如:org.jsoup:jsoup:1.7.2For example: org.jsoup:jsoup:1.7.2. 此字段为必需字段。This field is required.
存储库repo STRING 要从中安装 Maven 包的 Maven 存储库。Maven repo to install the Maven package from. 如果省略此项,则同时搜索 Maven 中央存储库和 Spark 包。If omitted, both Maven Central Repository and Spark Packages are searched.
排除exclusions 一个由 STRING 构成的数组An array of STRING 要排除的依赖项的列表。List of dependences to exclude. 例如:["slf4j:slf4j", "*:hadoop-client"]For example: ["slf4j:slf4j", "*:hadoop-client"].

Maven 依赖项排除:https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.htmlMaven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

PythonPyPiLibrary PythonPyPiLibrary

字段名称Field Name 类型Type 描述Description
package STRING 要安装的 PyPI 包的名称。The name of the PyPI package to install. 另外还支持可选的确切版本规范。An optional exact version specification is also supported. 示例:simplejsonsimplejson==3.8.0Examples: simplejson and simplejson==3.8.0. 此字段为必需字段。This field is required.
存储库repo STRING 可在其中找到该包的存储库。The repository where the package can be found. 如果此项未指定,则使用默认的 pip 索引。If not specified, the default pip index is used.

RCranLibrary RCranLibrary

字段名称Field Name 类型Type 描述Description
package STRING 要安装的 CRAN 包的名称。The name of the CRAN package to install. 此字段为必需字段。This field is required.
存储库repo STRING 可在其中找到该包的存储库。The repository where the package can be found. 如果此项未指定,则使用默认的 CRAN 存储库。If not specified, the default CRAN repo is used.

LibraryInstallStatus LibraryInstallStatus

特定群集上某个库的状态。The status of a library on a specific cluster.

状态Status 描述Description
PENDINGPENDING 尚未执行任何操作来安装该库。No action has yet been taken to install the library. 此状态应该很短暂。This state should be very short lived.
RESOLVINGRESOLVING 正在从提供的存储库检索安装该库所需的元数据。Metadata necessary to install the library is being retrieved from the provided repository.

对于 Jar、Egg 和 Whl 库,此步骤不执行任何操作。For Jar, Egg, and Whl libraries, this step is a no-op.
INSTALLINGINSTALLING 正在通过向 Spark 添加资源或在 Spark 节点内执行系统命令来主动安装该库。The library is actively being installed, either by adding resources to Spark or executing system commands inside the Spark nodes.
INSTALLEDINSTALLED 已成功安装该库。The library has been successfully installed.
SKIPPEDSKIPPED 由于 Scala 版本不兼容,已跳过 Databricks Runtime 7.0 或更高版本群集上的安装。Installation on a Databricks Runtime 7.0 or above cluster was skipped due to Scala version incompatibility.
FAILEDFAILED 安装中的某个步骤失败。Some step in installation failed. 可在 messages 字段中找到详细信息。More information can be found in the messages field.
UNINSTALL_ON_RESTARTUNINSTALL_ON_RESTART 该库已标记为要删除。The library has been marked for removal. 只有在重启群集时才能删除库,因此进入此状态的库会一直保留到群集重启为止。Libraries can be removed only when clusters are restarted, so libraries that enter this state will remain until the cluster is restarted.