群集在安装 Bokeh 后取消 Python 命令执行Cluster cancels Python command execution after installing Bokeh

问题Problem

群集在 Python 笔记本中返回 CancelledThe cluster returns Cancelled in a Python notebook. 检查“群集配置”页中的驱动程序日志 (std.err),查看堆栈跟踪和类似于以下内容的错误消息:Inspect the driver log (std.err) in the Cluster Configuration page for a stack trace and error message similar to the following:

log4j:WARN No appenders could be found for logger (com.databricks.conf.trusted.ProjectConf$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See https://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Traceback (most recent call last):
  File "/local_disk0/tmp/1551693540856-0/PythonShell.py", line 30, in <module>
    from IPython.nbconvert.filters.ansi import ansi2html
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/__init__.py", line 6, in <module>
    from . import postprocessors
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/postprocessors/__init__.py", line 6, in <module>
    from .serve import ServePostProcessor
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/postprocessors/serve.py", line 29, in <module>
    class ProxyHandler(web.RequestHandler):
  File "/databricks/python/lib/python3.5/site-packages/IPython/nbconvert/postprocessors/serve.py", line 31, in ProxyHandler
    @web.asynchronous
AttributeError: module 'tornado.web' has no attribute 'asynchronous'

原因Cause

安装 bokeh 库时,默认将安装 tornado 版本 6.0a1,这是一个 alpha 版本。When you install the bokeh library, by default tornado version 6.0a1 is installed, which is an alpha release. Alpha 版本会导致此错误,因此解决办法是恢复到 tornado 的稳定版本。The alpha release causes this error, so the solution is to revert to the stable version of tornado.

解决方案Solution

按照以下步骤创建群集范围的 init 脚本。Follow the steps below to create a cluster-scoped init script. Init 脚本会删除 tornado 的较新版本,然后安装稳定版本。The init script removes the newer version of tornado and installs the stable version.

  1. 如果 init 脚本尚不存在,请创建一个基目录来存储它:If the init script does not already exist, create a base directory to store it:

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. 创建以下脚本:Create the following script:

    dbutils.fs.put("dbfs:/databricks/<directory>/tornado.sh","""
    #!/bin/bash
    pip uninstall --yes tornado
    rm -rf /home/ubuntu/databricks/python/lib/python3.5/site-packages/tornado*
    rm -rf /databricks/python/lib/python3.5/site-packages/tornado*
    /usr/bin/yes | /home/ubuntu/databricks/python/bin/pip install tornado==5.1.1
    """,True)
    
  3. 确认该脚本存在:Confirm that the script exists:

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/tornado.sh"))
    
  4. 转到群集配置页,然后单击“高级选项”切换开关。Go to the cluster configuration page and click the Advanced Options toggle.

  5. 在页面底部,单击“Init 脚本”选项卡:At the bottom of the page, click the Init Scripts tab:

    no-alternative-textno-alternative-text

  6. 在“目标”下拉列表中,选择“DBFS”,提供脚本的文件路径,然后单击“添加” 。In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. 重启群集。Restart the cluster.

有关详细信息,请参阅:For more information, see: