在 Databricks Runtime 6.5 ML GPU 群集上安装 TensorFlow 2.1Install TensorFlow 2.1 on Databricks Runtime 6.5 ML GPU clusters

Databricks Runtime ML 包含 TensorFlow 的版本,因此无需安装任何包即可使用它。Databricks Runtime ML includes versions of TensorFlow so you can use it without installing any packages.

Databricks Runtime ML 版Databricks Runtime ML Version TensorFlow 版本TensorFlow Version
7.07.0 2.2.02.2.0
6.3 - 6.66.3 - 6.6 1.15.01.15.0

可以通过使用群集范围内的初始化脚本来安装其他版本的 TensorFlow。You can install other versions of TensorFlow by using a cluster-scoped init script.

本文介绍如何在 Databricks Runtime 6.5 ML GPU 群集上安装 TensorFlow 2.1。In this article, you learn how to install TensorFlow 2.1 on Databricks Runtime 6.5 ML GPU clusters.

重要

删除默认库并安装新版本可能会导致性能不稳定或完全损坏 Azure Databricks 群集。Removing default libraries and installing new versions may cause instability or completely break your Azure Databricks cluster. 在运行生产作业之前,应在环境中对任何新库版本进行全面测试。You should thoroughly test any new library version in your environment before running production jobs.

安装初始化脚本Install the init script

  1. 在 Databricks Runtime 6.5 ML GPU 群集上安装以下群集范围内的初始化脚本Install the following cluster-scoped init script on your Databricks Runtime 6.5 ML GPU cluster.

    #!/bin/bash
    set -e
    
    apt-get update
    apt-get install -y --no-install-recommends --allow-downgrades \
      libnccl2=2.4.8-1+cuda10.1 \
      libnccl-dev=2.4.8-1+cuda10.1 \
      cuda-libraries-10-1 \
      libcudnn7=7.6.4.38-1+cuda10.1 \
      libcudnn7-dev=7.6.4.38-1+cuda10.1 \
      libnvinfer6=6.0.1-1+cuda10.1 \
      libnvinfer-dev=6.0.1-1+cuda10.1 \
      libnvinfer-plugin6=6.0.1-1+cuda10.1
    apt-get clean
    ln -sfn cuda-10.1 /usr/local/cuda
    
    pip install tensorflow==2.1.* setuptools==41.* grpcio==1.24.*
    
    # This `conda list` is necessary to recognize the pip-installed packages.
    conda list
    conda install cudatoolkit=10.1
    
  2. 重启群集。Restart the cluster.