工作区库Workspace libraries

工作区库充当本地存储库,你可以从中创建群集安装库。Workspace libraries serve as a local repository from which you create cluster-installed libraries. 工作区库可能是你的组织创建的自定义代码,也可能是你的组织已经标准化的开源库的特定版本。A workspace library might be custom code created by your organization, or might be a particular version of an open-source library that your organization has standardized on.

必须先在群集上安装工作区库,然后才能将其用于笔记本或作业。You must install a workspace library on a cluster before it can be used in a notebook or job.

工作区中的所有用户均可使用共享文件夹中的工作区库,而某个用户文件夹中的工作区库仅该用户可用。Workspace libraries in the Shared folder are available to all users in a workspace, while workspace libraries in a user folder are available only to that user.

创建工作区库Create a workspace library

  1. 右键单击用于存储该库的工作区文件夹。Right-click the workspace folder where you want to store the library.

  2. 选择“创建”>“库”。Select Create > Library.

    创建库Create library

    将显示“创建库”对话框。The Create Library dialog displays.

    库选项Library options

  3. 选择“库源”并按照相应的过程操作:Select the Library Source and follow the appropriate procedure:

上传 Jar、Python Egg 或 Python Wheel Upload a Jar, Python Egg, or Python Wheel

  1. 在“库源”按钮列表中,选择“上传”。In the Library Source button list, select Upload.
  2. 选择“Jar”、“Python Egg”或“Python Whl” 。Select Jar, Python Egg, or Python Whl.
  3. 选择性地输入库名称。Optionally enter a library name.
  4. 将 Jar、Egg 或 Whl 拖到下拉框中,或单击下拉框,然后导航到文件。Drag your Jar, Egg, or Whl to the drop box or click the drop box and navigate to a file. 该文件将上传到 dbfs:/FileStore/jarsThe file is uploaded to dbfs:/FileStore/jars.
  5. 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
  6. 选择性地将库安装到群集上Optionally install the library on a cluster.

引用已上传的 Jar、Python Egg 或 Python Wheel Reference an uploaded Jar, Python Egg, or Python Wheel

如果已将 Jar、Egg 或 Wheel 上传到对象存储,则可以在工作区库中引用。If you’ve already uploaded a Jar, Egg, or Wheel to object storage you can reference it in a workspace library.

可以在 DBFS 中选择一个库。You can choose a library in DBFS.

  1. 在“库源”按钮列表中,选择“DBFS”。Select DBFS in the Library Source button list.
  2. 选择“Jar”、“Python Egg”或“Python Whl” 。Select Jar, Python Egg, or Python Whl.
  3. 选择性地输入库名称。Optionally enter a library name.
  4. 指定库的 DBFS 路径。Specify the DBFS path to the library.
  5. 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
  6. 选择性地将库安装到群集上Optionally install the library on a cluster.

PyPI 包 PyPI package

  1. 在“库源”按钮列表中,选择“PyPI”。In the Library Source button list, select PyPI.
  2. 输入 PyPI 包名称。Enter a PyPI package name. 若要安装特定版本的库,请对该库使用此格式:<library>==<version>To install a specific version of a library use this format for the library: <library>==<version>. 例如,scikit-learn==0.19.1For example, scikit-learn==0.19.1.
  3. 在“存储库”字段中,选择性地输入 PyPI 存储库 URL。In the Repository field, optionally enter a PyPI repository URL.
  4. 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
  5. 选择性地将库安装到群集上Optionally install the library on a cluster.

Maven 或 Spark 包 Maven or Spark package

  1. 在“库源”按钮列表中,选择“Maven”。In the Library Source button list, select Maven.

  2. 指定 Maven 坐标。Specify a Maven coordinate. 执行下列操作之一:Do one of the following:

    • 在“坐标”字段中,输入要安装的库的 Maven 坐标。In the Coordinate field, enter the Maven coordinate of the library to install. Maven 坐标的格式为 groupId:artifactId:version;例如 com.databricks:spark-avro_2.10:1.0.0Maven coordinates are in the form groupId:artifactId:version; for example, com.databricks:spark-avro_2.10:1.0.0.
    • 如果不知道确切的坐标,请输入库名称,然后单击“搜索包”。If you don’t know the exact coordinate, enter the library name and click Search Packages. 将显示匹配的包的列表。A list of matching packages displays. 若要显示有关包的详细信息,请单击其名称。To display details about a package, click its name. 可以按名称、组织和评级对包进行排序。You can sort packages by name, organization, and rating. 还可以通过在搜索栏中编写查询来筛选结果。You can also filter the results by writing a query in the search bar. 结果将自动刷新。The results refresh automatically.
      1. 在左上角的下拉列表中选择“Maven Central”或“Spark 包 ” 。Select Maven Central or Spark Packages in the drop-down list at the top left.
      2. 可选择在“发布”列中选择包版本。Optionally select the package version in the Releases column.
      3. 单击包旁边的“+ 选择”。Click + Select next to a package. 将用所选包和版本填充该“坐标”字段。The Coordinate field is filled in with the selected package and version.
  3. 在“存储库”字段中,选择性地输入 Maven 存储库 URL。In the Repository field, optionally enter a Maven repository URL.

    备注

    不支持内部 Maven 存储库。Internal Maven repositories are not supported.

  4. 在“排除项”字段中,选择性地提供要排除的依赖项的 groupIdartifactId;例如 log4j:log4jIn the Exclusions field, optionally provide the groupId and the artifactId of the dependencies that you want to exclude; for example, log4j:log4j.

  5. 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.

  6. 选择性地将库安装到群集上Optionally install the library on a cluster.

CRAN 包 CRAN package

  1. 在“库源”按钮列表中,选择“CRAN”。In the Library Source button list, select CRAN.
  2. 在“包”字段中,输入包的名称。In the Package field, enter the name of the package.
  3. 在“存储库”字段中,选择性地输入 CRAN 存储库 URL。In the Repository field, optionally enter the CRAN repository URL.
  4. 单击“创建”。Click Create. 将显示“库详细信息”屏幕。The library detail screen displays.
  5. 选择性地将库安装到群集上Optionally install the library on a cluster.

备注

CRAN 镜像提供库的最新版本。CRAN mirrors serve the latest version of a library. 因此,如果在不同的时间将库附加到不同的群集,则最终可能会得到不同版本的 R 包。As a result, you may end up with different versions of an R package if you attach the library to different clusters at different times. 若要了解如何在 Databricks 上管理和修复 R 包版本,请参阅知识库To learn how to manage and fix R package versions on Databricks, see the Knowledge Base.

查看工作区库详细信息 View workspace library details

  1. 转到包含该库的工作区文件夹。Go to the workspace folder containing the library.
  2. 单击库名称。Click the library name.

“库详细信息”页面显示该库运行中的群集及其安装状态The library details page shows the running clusters and the install status of the library. 如果已安装库,则页面包含指向包主机的链接。If the library is installed, the page contains a link to the package host. 如果已上传库,则页面将显示指向已上传的包文件的链接。If the library was uploaded, the page displays a link to the uploaded package file.

移动工作区库Move a workspace library

  1. 转到包含该库的工作区文件夹。Go to the workspace folder containing the library.
  2. 单击库名称右边的下拉箭头菜单下拉箭头,然后选择“移动”。Click the drop-down arrow Menu Dropdown to the right of the library name and select Move. 将显示文件夹浏览器。A folder browser displays.
  3. 单击目标文件夹。Click the destination folder.
  4. 单击“选择”。Click Select.
  5. 单击“确认并移动”。Click Confirm and Move.

删除工作区库Delete a workspace library

重要

删除工作区库之前,应将其从所有群集中卸载Before deleting a workspace library, you should uninstall it from all clusters.

若要删除工作区库,请执行以下操作:To delete a workspace library:

  1. 将库移动到“回收站”文件夹。Move the library to the Trash folder.
  2. 永久删除“回收站”文件夹中的库,或清空“回收站”文件夹。Either permanently delete the library in the Trash folder or empty the Trash folder.