工作区库Workspace libraries
工作区库充当本地存储库,你可以从中创建群集安装库。Workspace libraries serve as a local repository from which you create cluster-installed libraries. 工作区库可能是你的组织创建的自定义代码,也可能是你的组织已经标准化的开源库的特定版本。A workspace library might be custom code created by your organization, or might be a particular version of an open-source library that your organization has standardized on.
必须先在群集上安装工作区库,然后才能将其用于笔记本或作业。You must install a workspace library on a cluster before it can be used in a notebook or job.
工作区中的所有用户均可使用共享文件夹中的工作区库,而某个用户文件夹中的工作区库仅该用户可用。Workspace libraries in the Shared folder are available to all users in a workspace, while workspace libraries in a user folder are available only to that user.
创建工作区库Create a workspace library
右键单击用于存储该库的工作区文件夹。Right-click the workspace folder where you want to store the library.
选择“创建”>“库”。Select Create > Library.
将显示“创建库”对话框。The Create Library dialog displays.
选择“库源”并按照相应的过程操作:Select the Library Source and follow the appropriate procedure:
上传 Jar、Python Egg 或 Python Wheel Upload a Jar, Python Egg, or Python Wheel
- 在“库源”按钮列表中,选择“上传”。In the Library Source button list, select Upload.
- 选择“Jar”、“Python Egg”或“Python Whl” 。Select Jar, Python Egg, or Python Whl.
- 选择性地输入库名称。Optionally enter a library name.
- 将 Jar、Egg 或 Whl 拖到下拉框中,或单击下拉框,然后导航到文件。Drag your Jar, Egg, or Whl to the drop box or click the drop box and navigate to a file. 该文件将上传到
dbfs:/FileStore/jars
。The file is uploaded todbfs:/FileStore/jars
. - 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
- 选择性地将库安装到群集上。Optionally install the library on a cluster.
引用已上传的 Jar、Python Egg 或 Python Wheel Reference an uploaded Jar, Python Egg, or Python Wheel
如果已将 Jar、Egg 或 Wheel 上传到对象存储,则可以在工作区库中引用。If you’ve already uploaded a Jar, Egg, or Wheel to object storage you can reference it in a workspace library.
可以在 DBFS 中选择一个库。You can choose a library in DBFS.
- 在“库源”按钮列表中,选择“DBFS”。Select DBFS in the Library Source button list.
- 选择“Jar”、“Python Egg”或“Python Whl” 。Select Jar, Python Egg, or Python Whl.
- 选择性地输入库名称。Optionally enter a library name.
- 指定库的 DBFS 路径。Specify the DBFS path to the library.
- 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
- 选择性地将库安装到群集上。Optionally install the library on a cluster.
PyPI 包 PyPI package
- 在“库源”按钮列表中,选择“PyPI”。In the Library Source button list, select PyPI.
- 输入 PyPI 包名称。Enter a PyPI package name. 若要安装特定版本的库,请对该库使用此格式:
<library>==<version>
。To install a specific version of a library use this format for the library:<library>==<version>
. 例如,scikit-learn==0.19.1
。For example,scikit-learn==0.19.1
. - 在“存储库”字段中,选择性地输入 PyPI 存储库 URL。In the Repository field, optionally enter a PyPI repository URL.
- 单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
- 选择性地将库安装到群集上。Optionally install the library on a cluster.
Maven 或 Spark 包 Maven or Spark package
在“库源”按钮列表中,选择“Maven”。In the Library Source button list, select Maven.
指定 Maven 坐标。Specify a Maven coordinate. 执行下列操作之一:Do one of the following:
- 在“坐标”字段中,输入要安装的库的 Maven 坐标。In the Coordinate field, enter the Maven coordinate of the library to install. Maven 坐标的格式为
groupId:artifactId:version
;例如com.databricks:spark-avro_2.10:1.0.0
。Maven coordinates are in the formgroupId:artifactId:version
; for example,com.databricks:spark-avro_2.10:1.0.0
. - 如果不知道确切的坐标,请输入库名称,然后单击“搜索包”。If you don’t know the exact coordinate, enter the library name and click Search Packages. 将显示匹配的包的列表。A list of matching packages displays. 若要显示有关包的详细信息,请单击其名称。To display details about a package, click its name. 可以按名称、组织和评级对包进行排序。You can sort packages by name, organization, and rating. 还可以通过在搜索栏中编写查询来筛选结果。You can also filter the results by writing a query in the search bar. 结果将自动刷新。The results refresh automatically.
- 在左上角的下拉列表中选择“Maven Central”或“Spark 包 ” 。Select Maven Central or Spark Packages in the drop-down list at the top left.
- 可选择在“发布”列中选择包版本。Optionally select the package version in the Releases column.
- 单击包旁边的“+ 选择”。Click + Select next to a package. 将用所选包和版本填充该“坐标”字段。The Coordinate field is filled in with the selected package and version.
- 在“坐标”字段中,输入要安装的库的 Maven 坐标。In the Coordinate field, enter the Maven coordinate of the library to install. Maven 坐标的格式为
在“存储库”字段中,选择性地输入 Maven 存储库 URL。In the Repository field, optionally enter a Maven repository URL.
备注
不支持内部 Maven 存储库。Internal Maven repositories are not supported.
在“排除项”字段中,选择性地提供要排除的依赖项的
groupId
和artifactId
;例如log4j:log4j
。In the Exclusions field, optionally provide thegroupId
and theartifactId
of the dependencies that you want to exclude; for example,log4j:log4j
.单击“创建”。Click Create. 将显示“库状态”屏幕。The library status screen displays.
选择性地将库安装到群集上。Optionally install the library on a cluster.
CRAN 包 CRAN package
- 在“库源”按钮列表中,选择“CRAN”。In the Library Source button list, select CRAN.
- 在“包”字段中,输入包的名称。In the Package field, enter the name of the package.
- 在“存储库”字段中,选择性地输入 CRAN 存储库 URL。In the Repository field, optionally enter the CRAN repository URL.
- 单击“创建”。Click Create. 将显示“库详细信息”屏幕。The library detail screen displays.
- 选择性地将库安装到群集上。Optionally install the library on a cluster.
备注
CRAN 镜像提供库的最新版本。CRAN mirrors serve the latest version of a library. 因此,如果在不同的时间将库附加到不同的群集,则最终可能会得到不同版本的 R 包。As a result, you may end up with different versions of an R package if you attach the library to different clusters at different times. 若要了解如何在 Databricks 上管理和修复 R 包版本,请参阅知识库。To learn how to manage and fix R package versions on Databricks, see the Knowledge Base.
查看工作区库详细信息 View workspace library details
- 转到包含该库的工作区文件夹。Go to the workspace folder containing the library.
- 单击库名称。Click the library name.
“库详细信息”页面显示该库运行中的群集及其安装状态。The library details page shows the running clusters and the install status of the library. 如果已安装库,则页面包含指向包主机的链接。If the library is installed, the page contains a link to the package host. 如果已上传库,则页面将显示指向已上传的包文件的链接。If the library was uploaded, the page displays a link to the uploaded package file.
移动工作区库Move a workspace library
- 转到包含该库的工作区文件夹。Go to the workspace folder containing the library.
- 单击库名称右边的下拉箭头
,然后选择“移动”。Click the drop-down arrow
to the right of the library name and select Move. 将显示文件夹浏览器。A folder browser displays.
- 单击目标文件夹。Click the destination folder.
- 单击“选择”。Click Select.
- 单击“确认并移动”。Click Confirm and Move.
删除工作区库Delete a workspace library
重要
删除工作区库之前,应将其从所有群集中卸载。Before deleting a workspace library, you should uninstall it from all clusters.
若要删除工作区库,请执行以下操作:To delete a workspace library:
- 将库移动到“回收站”文件夹。Move the library to the Trash folder.
- 永久删除“回收站”文件夹中的库,或清空“回收站”文件夹。Either permanently delete the library in the Trash folder or empty the Trash folder.