Databricks Runtime 5.5 LTS 上的作业失败,出现 SQLAlchemy 包错误Jobs failing on Databricks Runtime 5.5 LTS with an SQLAlchemy package error

问题Problem

需要第三方库 SQLAlchemy 的 Azure Databricks 作业失败。Azure Databricks jobs that require the third-party library SQLAlchemy are failing. 此问题大约于 2020 年 3 月 10 日开始发生。This issue started occurring on or about March 10, 2020.

作业群集和通用群集的错误消息位置有所不同,但它类似于下面的示例错误消息:The error message location differs for job clusters and all-purpose clusters, but it is similar to the following example error message:

Library installation failed for library pypi {
 package: "sqlalchemy"
}
. Error messages:
java.lang.RuntimeException: ManagedLibraryInstallFailed: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, sqlalchemy, --disable-pip-version-check) exited with code 2. ERROR: Exception:

作业群集上的失败Failure on job clusters

在作业群集上,该错误会表现为无法启动。On a job cluster the error manifests as a failure to start. 可以通过查看作业运行结果并查找类似于示例错误消息的文本来确认问题。You can confirm the issue by viewing the job run results and looking for text similar to the example error message.

通用群集上的失败Failure on all-purpose clusters

如果通用群集使用 Databricks Runtime 5.5 LTS,则可以在工作区 UI 中查看错误消息。If you have all-purpose clusters using Databricks Runtime 5.5 LTS, you can view the error message within the workspace UI.

  1. 单击“群集”。Click Clusters.
  2. 单击群集的名称。Click the name of your cluster.
  3. 单击“库”。Click Libraries.
  4. 单击“sqlalchemy”。Click sqlalchemy.
  5. 阅读“消息”标题下的错误消息。Read the error messages under the Messages heading.

查找类似于示例错误消息的文本。Look for text similar to the example error message.

VersionVersion

此问题影响使用 SQLAlchemy 1.3.15 的 Databricks Runtime 5.5 LTS 群集。The problem affects clusters on Databricks Runtime 5.5 LTS using SQLAlchemy 1.3.15.

原因Cause

在 2020 年 3 月 10 日,PyPI 已将 SQLAlchemy 的发行版更新到 1.3.15。On March 10, 2020, PyPI updated the release version of SQLAlchemy to 1.3.15. 如果使用 PyPI 自动下载最新版本的 SQLAlchemy,并使用 Databricks Runtime 5.5 LTS 群集,则对 SQLAlchemy 的更新可能会导致作业失败。If you are using PyPI to automatically download the most current version of SQLAlchemy and you are using Databricks Runtime 5.5 LTS clusters, the update to SQLAlchemy may result in job failures.

解决方案Solution

有两种解决方法可用。There are two workarounds available.

将 SQLAlchemy 限制为版本 1.3.13Restrict SQLAlchemy to version 1.3.13

使用 PyPI 包安装说明安装 SQLAlchemy,并将版本值设置为 1.3.13。Install SQLAlchemy using the PyPI package installation instructions and set the version value to 1.3.13.

阻止 Python 使用 pep517 生成系统Prevent Python from using the pep517 build system

可以使用 init 脚本阻止 Python 使用 pep517 生成系统。You can use an init script to prevent Python from using the pep517 build system.

使用以下代码块在群集上生成 init 脚本 install-sqlalchemy.shUse the following code block to generate the init script install-sqlalchemy.sh on your cluster:

dbutils.fs.put("/databricks/init-scripts/install-sqlalchemy.sh", """
#!/bin/bash
/databricks/python/bin/pip install sqlalchemy --disable-pip-version-check --no-use-pep517
""", True)

按照现有文档,将脚本作为群集范围内的 init 脚本进行安装。Follow the existing documentation to install the script as a cluster-scoped init script. 安装此脚本后,重新启动群集。Restart the cluster after you have installed the script.

最佳做法建议Best practice recommendation

无论何时使用第三方库,都应始终将群集配置为使用已知要工作的每个库的特定版本。Whenever you use third-party libraries, you should always configure your clusters to use specific versions of each library that are known to be working. 新版本的库可以提供新功能,但如果在不进行测试和验证的情况下部署这些新版本的库,它们也可能会带来问题。New versions of libraries can offer new features, but they can also introduce problems if they are deployed without testing and validation.