若要使第三方或自定义代码可供在你的群集上运行的笔记本和作业使用,你可以安装库。To make third-party or custom code available to notebooks and jobs running on your clusters, you can install a library. 可以用 Python、Java、Scala 和 R 编写库。你可以上传 Java、Scala 和 Python 库,将其指向 PyPI、Maven 和 CRAN 存储库中的外部包。Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.

本文重点介绍了如何在工作区 UI 中执行库任务。This article focuses on performing library tasks in the workspace UI. 你还可以使用库 CLI库 API 来管理库。You can also manage libraries using the Libraries CLI or the Libraries API.


默认情况下,Databricks 会安装许多常用库。Databricks installs many common libraries by default. 若要查看默认情况下安装哪些库,请查看适用于 Databricks Runtime 版本的 Databricks Runtime 发行说明中的“系统环境”小节。To see which libraries are installed by default, look at the System Environment subsection of the Databricks Runtime release notes for your Databricks Runtime version.

可以采用以下三种模式之一来安装库:工作区、群集安装,以及作用域为笔记本。Libraries can be installed in one of three modes: workspace, cluster-installed, and notebook-scoped.

  • 工作区库充当本地存储库,你可以从中创建群集安装库。Workspace libraries serve as a local repository from which you create cluster-installed libraries. 工作区库可能是你的组织创建的自定义代码,也可能是你的组织已经标准化的开源库的特定版本。A workspace library might be custom code created by your organization, or might be a particular version of an open-source library that your organization has standardized on.

  • 群集库可供群集上运行的所有笔记本使用。Cluster libraries can be used by all notebooks running on a cluster. 可以直接从公共存储库(例如 PyPI 或 Maven)安装群集库,也可以从以前安装的工作区库中创建一个。You can install a cluster library directly from a public repository such as PyPI or Maven, or create one from a previously installed workspace library.

  • 作用域为笔记本的 Python 库允许你安装 Python 库并创建作用域为笔记本会话的环境。Notebook-scoped Python libraries allow you to install Python libraries and create an environment scoped to a notebook session. 作用域为笔记本的库不会影响在同一群集上运行的其他笔记本。Notebook-scoped libraries do not affect other notebooks running on the same cluster. 这些库不会保留,必须为每个会话重新安装这些库。These libraries do not persist and must be re-installed for each session.

    当每个特定笔记本需要一个自定义 Python 环境时,请使用作用域为笔记本的库。Use notebook-scoped libraries when you need a custom Python environment for a specific notebook. 使用作用域为笔记本的库,还可以保存、重用和共享 Python 环境。With notebook-scoped libraries, you can also save, reuse, and share Python environments.

    • 在 Databricks Runtime ML 6.4 及更高版本中,可通过 %pip%conda magic 命令使用作用域为笔记本的库,在Databricks Runtime 7.1 及更高版本中,可通过 %pip magic命令使用这些库。Notebook-scoped libraries are available via %pip and %conda magic commands in Databricks Runtime ML 6.4 and above, and via %pip magic commands in Databricks Runtime 7.1 and above. 请参阅作用域为笔记本的 Python 库See Notebook-scoped Python libraries.
    • 在所有 Databricks Runtime 版本中,都可以通过库实用工具使用作用域为笔记本的库。Notebook-scoped libraries are available via library utilities in all Databricks Runtime versions. 请参阅库实用工具See Library utilities.

本部分的内容:This section covers: