库不可用导致作业失败Library unavailability causing job failures

本文介绍了在启动导入外部库的作业时可能会遇到的导入错误。This article explains an Import Error you may encounter when launching jobs that import external libraries.

问题Problem

当作业导致节点重启时,作业会失败,并出现以下错误消息:When a job causes a node to restart, the job fails with the following error message:

ImportError: No module named XXX

原因Cause

群集管理器是用于管理客户 Apache Spark 群集的 Azure Databricks 服务的一部分。The Cluster Manager is part of the Azure Databricks service that manages customer Apache Spark clusters. 它在重启每个节点时,会发送命令来安装 Python 和 R 库。It sends commands to install Python and R libraries when it restarts each node. 有时,安装库或从 internet 下载项目所耗时间可能比预期长。Sometimes, library installation or downloading of artifacts from the internet can take more time than expected. 这是由于网络延迟导致的;如果附加到群集的库有许多依赖库,也会发生这种情况。This occurs due to network latency, or it occurs if the library that is being attached to the cluster has many dependent libraries.

库安装机制保证当笔记本附加到群集时,可导入已安装的库。The library installation mechanism guarantees that when a notebook attaches to a cluster, it can import installed libraries. 如果通过 PyPI 安装库的时间过长,笔记本将在库安装完成之前附加到群集。When library installation through PyPI takes excessive time, the notebook attaches to the cluster before the library installation completes. 在这种情况下,笔记本无法导入库。In this case, the notebook is unable to import the library.

解决方案Solution

方法 1Method 1

在笔记本中使用笔记本范围内的库安装命令。Use notebook-scoped library installation commands in the notebook. 可在一个单元格中输入以下命令,确保已安装所有指定的库。You can enter the following commands in one cell, which ensures that all of the specified libraries are installed.

dbutils.library.installPyPI("mlflow")
dbutils.library.restartPython()

方法 2Method 2

若要避免从 Internet 存储库下载库时出现延迟,可在 DBFS 或 Azure Blob 存储中缓存库。To avoid delay in downloading the libraries from the internet repositories, you can cache the libraries in DBFS or Azure Blob Storage.

例如,可将 Python 库的 wheel 或 egg 文件下载到 DBFS 或 Azure Blob 存储位置。For example, you can download the wheel or egg file for a Python library to a DBFS or Azure Blob Storage location. 可使用 REST API 或群集范围的 init 脚本从 DBFS 或 Azure Blob 存储安装库。You can use the REST API or cluster-scoped init scripts to install libraries from DBFS or Azure Blob Storage.

首先,将 wheel 或 egg 文件从 Internet 下载到 DBFS 或 Azure Blob 存储位置。First, download the wheel or egg file from the internet to the DBFS or Azure Blob Storage location. 可在笔记本中执行此操作,如下所示:This can be performed in a notebook as follows:

%sh
cd /dbfs/mnt/library
 wget <whl/egg file location from the pypi repository>

下载 wheel 或 egg 文件后,可使用 REST API、UI 或 init 脚本命令将库安装到群集。After the wheel or egg file download completes, you can install the library to the cluster using the REST API, UI, or init script commands.