库失败并出现依赖项异常Libraries fail with dependency exception

问题Problem

你有一个 Python 函数是在自定义 egg 或 wheel 文件中定义的,还有依赖项由集群上安装的另一个客户包来满足要求。You have a Python function that is defined in a custom egg or wheel file and also has dependencies that are satisfied by another customer package installed on the cluster.

调用此函数时,它会返回一个错误,指出无法满足要求。When you call this function, it returns an error that says the requirement cannot be satisfied.

org.apache.spark.SparkException: Process List(/local_disk0/pythonVirtualEnvDirs/virtualEnv-d82b31df-1da3-4ee9-864d-8d1fce09c09b/bin/python, /local_disk0/pythonVirtualEnvDirs/virtualEnv-d82b31df-1da3-4ee9-864d-8d1fce09c09b/bin/pip, install, fractal==0.1.0, --disable-pip-version-check) exited with code 1. Could not find a version that satisfies the requirement fractal==0.1.0 (from versions: 0.1.1, 0.1.2, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0)

例如,假设你同时安装了 wheel A 和 wheel B,而无论是通过 UI 安装到群集还是通过笔记本范围内的库。As an example, imagine that you have both wheel A and wheel B installed, either to the cluster via the UI or via notebook-scoped libraries. 假定 wheel A 在 wheel B 上具有依赖项。Assume that wheel A has a dependency on wheel B.

  • dbutils.library.install(/path_to_wheel/A.whl)
  • dbutils.library.install(/path_to_wheel/B.whl)

尝试使用上述某一个库进行调用时,你会收到“无法满足要求”错误。When you try to make a call using one of these libraries, you get a requirement cannot be satisfied error.

原因Cause

即使已通过群集 UI 安装所需的依赖项或通过笔记本范围内的库安装来满足了要求,Azure Databricks 仍然无法保证特定库在群集上的安装顺序。Even though the requirements have been met by installing the required dependencies via the cluster UI or via a notebook-scoped library installation, Azure Databricks cannot guarantee the order in which specific libraries are installed on the cluster. 如果某个库正在被引用,但它却未分发给执行程序节点,那么为了满足要求,它将回退到 PyPI 并在本地使用。If a library is being referenced and it has not been distributed to the executor nodes, it will fallback to PyPI and use it locally to satisfy the requirement.

解决方案Solution

应使用一个包含所有必需代码和依赖项的 egg 或 wheel 文件。You should use one egg or wheel file that contains all required code and dependencies. 这可确保代码在运行时加载了正确的库且这些库可供使用。This ensures that your code has the correct libraries loaded and available at run time.