在群集上安装 pyodbc 时出错Error when installing pyodbc on a cluster

问题Problem

使用 pip 安装 pyodbc 库时,会出现以下错误之一。One of the following errors occurs when you use pip to install the pyodbc library.

java.lang.RuntimeException: Installation failed with message: Collecting pyodbc
"Library installation is failing due to missing dependencies. sasl and thrift_sasl are optional dependencies for SASL or Kerberos support"

原因Cause

虽然 saslthrift_sasl 是面向 SASL 或 Kerberos 支持的可选依赖项,但是必须有它们,才能成功执行 pyodbc 安装。Although sasl and thrift_sasl are optional dependencies for SASL or Kerberos support, they need to be present for pyodbc installation to succeed.

解决方案Solution

在单个笔记本中设置解决方案Set up solution in a single notebook

  1. 在笔记本中,检查 thrift 版本并升级到最新版本。In the notebook, check the version of thrift and upgrade to the latest version.

    %sh
    pip list | egrep 'thrift-sasl|sasl'
    pip install --upgrade thrift
    
  2. 确保已安装依赖包。Ensure that dependent packages are installed.

    %sh dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
    
  3. 安装 pyodbc 之前,请安装 nnixodbcInstall nnixodbc before installing pyodbc.

    %sh sudo apt-get -y install unixodbc-dev libsasl2-dev gcc python-dev
    

将解决方案设置为群集范围的 init 脚本Set up solution as a cluster-scoped init script

可将这些命令放入单个 init 脚本并将其附加到群集。You can put these commands into a single init script and attach it to the cluster. 这可确保在启动群集之前安装 pyodbc 的依赖库。This ensures that the dependent libraries for pyodbc are installed before the cluster starts.

  1. 如果基目录不存在,请创建基目录用于存储 init 脚本。Create the base directory to store the init script in, if the base directory does not exist. 此处,以 dbfs:/databricks/<directory> 为例。Here, use dbfs:/databricks/<directory> as an example.

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. 创建脚本并将其保存到文件中。Create the script and save it to a file.

    dbutils.fs.put("dbfs:/databricks/<directory>/tornado.sh","""
    #!/bin/bash
    pip list | egrep 'thrift-sasl|sasl'
    pip install --upgrade thrift
    dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
    sudo apt-get -y install unixodbc-dev libsasl2-dev gcc python-dev
    """,True)
    
  3. 检查该脚本是否存在。Check that the script exists.

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/tornado.sh"))
    
  4. 在群集配置页面上,单击“高级选项”切换开关。On the cluster configuration page, click the Advanced Options toggle.

  5. 在页面底部,单击“Init 脚本”选项卡。At the bottom of the page, click the Init Scripts tab.

    no-alternative-textno-alternative-text

  6. 在“目标”下拉列表中,选择“DBFS”,提供脚本的文件路径,然后单击“添加” 。In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. 重新启动群集Restart the cluster

有关群集范围的 init 脚本的更多详细信息,请参阅群集范围的 init 脚本For more details about cluster-scoped init scripts, see Cluster-scoped init scripts.