群集因库冲突而取消 Python 命令执行 Cluster cancels Python command execution due to library conflict

问题Problem

群集在 Python 笔记本中返回 CancelledThe cluster returns Cancelled in a Python notebook. 其他所有语言的笔记本在该群集上成功执行。Notebooks in all other languages execute successfully on the same cluster.

原因Cause

将版本冲突的库(例如 ipythonipywidgetsnumpyscipypandas)安装到 PYTHONPATH 时,Python REPL 可能会中断,导致所有命令在 30 秒后返回 CancelledWhen you install a conflicting version of a library, such as ipython, ipywidgets, numpy, scipy, or pandas to the PYTHONPATH, then the Python REPL can break, causing all commands to return Cancelled after 30 seconds. 这也会中断 %sh(允许在 Python 笔记本单元格中输入 Shell 脚本的笔记本宏)。This also breaks %sh, the notebook macro that lets you enter shell scripts in Python notebook cells.

备注

解决方案Solution

若要解决此问题,请执行以下操作:To solve this problem, do the following:

  1. 确定发生冲突的库并将其卸载。Identify the conflicting library and uninstall it.
  2. 在笔记本中或使用群集范围内的 init 脚本安装库的正确版本。Install the correct version of the library in a notebook or with a cluster-scoped init script.

确定发生冲突的库Identify the conflicting library

  1. 一次卸载一个库,检查 Python REPL 是否仍然中断。Uninstall each library one at a time, and check if the Python REPL still breaks.
  2. 如果 REPL 仍然中断,请重新安装已删除的库,然后删除下一个库。If the REPL still breaks, reinstall the library you removed and remove the next one.
  3. 找到导致 REPL 中断的库后,使用下面两种方法之一安装该库的正确版本。When you find the library that causes the REPL to break, install the correct version of that library using one of the two methods below.

还可检查群集(在“群集配置”页面上)的驱动程序日志 (std.err),获取堆栈跟踪和错误消息来帮助确定库冲突。You can also inspect the driver log (std.err) for the cluster (on the Cluster Configuration page) for a stack trace and error message that can help identify the library conflict.

安装正确的库Install the correct library

执行下列操作之一:Do one of the following.

选项 1:使用 pip3 在笔记本中进行安装Option 1: Install in a notebook using pip3

%sh sudo apt-get -y install python3-pip
  pip3 install <library-name>

选项 2:使用群集范围内的 init 脚本进行安装Option 2: Install using a cluster-scoped init script

按照以下步骤创建群集范围内的 init 脚本,该脚本将安装库的正确版本。Follow the steps below to create a cluster-scoped init script that installs the correct version of the library. 将示例中的 <library-name> 替换为要安装的库的文件名。Replace <library-name> in the examples with the filename of the library to install.

  1. 如果 init 脚本尚不存在,请创建一个基目录来存储它:If the init script does not already exist, create a base directory to store it:

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. 创建以下脚本:Create the following script:

    dbutils.fs.put("/databricks/init/cluster-name/<library-name>.sh","""
     #!/bin/bash
     sudo apt-get -y install python3-pip
     sudo pip3 install <library-name>
     """, True)
    
  3. 确认该脚本存在:Confirm that the script exists:

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/<library-name>.sh"))
    
  4. 转到群集配置页,然后单击“高级选项”切换开关。Go to the cluster configuration page and click the Advanced Options toggle.

  5. 在页面底部,单击“Init 脚本”选项卡:At the bottom of the page, click the Init Scripts tab:

    no-alternative-textno-alternative-text

  6. 在“目标”下拉列表中,选择“DBFS”,提供脚本的文件路径,然后单击“添加” 。In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. 重启群集。Restart the cluster.