在运行 Windows 的 N 系列 VM 上安装 NVIDIA GPU 驱动程序Install NVIDIA GPU drivers on N-series VMs running Windows

若要利用 NVIDIA GPU 支持的 Azure N 系列 VM 的 GPU 功能,必须安装 NVIDIA GPU 驱动程序。To take advantage of the GPU capabilities of Azure N-series VMs backed by NVIDIA GPUs, you must install NVIDIA GPU drivers.

选择手动安装 NVIDIA GPU 驱动程序时,请参阅本文,其中提供了受支持的操作系统、驱动程序以及安装和验证步骤。When you choose to install NVIDIA GPU drivers manually, this article provides supported operating systems, drivers, and installation and verification steps. 针对 Linux VM 也提供了驱动程序手动安装信息。Manual driver setup information is also available for Linux VMs.

有关基本规范、存储容量和磁盘详细信息,请参阅 GPU Windows VM 大小For basic specs, storage capacities, and disk details, see GPU Windows VM sizes.

支持的操作系统和驱动程序Supported operating systems and drivers

NVIDIA Tesla (CUDA) 驱动程序NVIDIA Tesla (CUDA) drivers

仅下表中列出的操作系统支持适用于 NCv3 系列 VM 的 NVIDIA Tesla (CUDA) 驱动程序。NVIDIA Tesla (CUDA) drivers for NCv3-series VMs is supported only on the operating systems listed in the following table. 在本文发布时,驱动程序下载链接是最新的。Driver download links are current at time of publication. 有关最新驱动程序,请访问 NVIDIA 网站。For the latest drivers, visit the NVIDIA website.

提示

作为一种在 Windows Server VM 上手动安装 CUDA 驱动程序的替代方法,可以部署 Azure 数据科学虚拟机映像。As an alternative to manual CUDA driver installation on a Windows Server VM, you can deploy an Azure Data Science Virtual Machine image. 用于 Windows Server 2016 的 DSVM 版本预安装 NVIDIA CUDA 驱动程序、CUDA 深度神经网络库和其他工具。The DSVM editions for Windows Server 2016 pre-install NVIDIA CUDA drivers, the CUDA Deep Neural Network Library, and other tools.

操作系统OS 驱动程序Driver
Windows Server 2016Windows Server 2016 398.75 (.exe)398.75 (.exe)
Windows Server 2012 R2Windows Server 2012 R2 398.75 (.exe)398.75 (.exe)

驱动程序安装Driver installation

  1. 通过远程桌面连接到每个 N 系列 VM。Connect by Remote Desktop to each N-series VM.

  2. 下载、解压缩并安装 Windows 操作系统支持的驱动程序。Download, extract, and install the supported driver for your Windows operating system.

安装 CUDA 驱动程序后,不需要重启。After CUDA driver installation, a restart is not required.

验证驱动程序安装Verify driver installation

如果已安装 CUDA 驱动程序,则 Nvidia 控制面板将不可见。If you have installed CUDA drivers then the Nvidia control panel will not be visible.

可以在设备管理器中验证驱动程序安装。You can verify driver installation in Device Manager. 以下示例展示了如何在 Azure NC VM 上成功配置 Tesla K80 卡。The following example shows successful configuration of the Tesla K80 card on an Azure NC VM.

GPU 驱动程序属性

若要查询 GPU 设备状态,请运行与驱动程序一起安装的 nvidia-smi 命令行实用工具。To query the GPU device state, run the nvidia-smi command-line utility installed with the driver.

  1. 打开命令提示符,并更改为 C:\Program Files\NVIDIA Corporation\NVSMI 目录。Open a command prompt and change to the C:\Program Files\NVIDIA Corporation\NVSMI directory.

  2. 运行 nvidia-smiRun nvidia-smi. 如果安装了驱动程序,将看到如下输出。If the driver is installed, you will see output similar to the following. 除非当前正在 VM 上运行 GPU 工作负荷,否则“GPU-Util”将显示“0%” 。The GPU-Util shows 0% unless you are currently running a GPU workload on the VM. 驱动程序版本和 GPU 详细信息可能与所示的内容不同。Your driver version and GPU details may be different from the ones shown.

    NVIDIA 设备状态

RDMA 网络连接RDMA network connectivity

可以在同一可用性集或虚拟机规模集的单个放置组中部署的支持 RDMA 的 N 系列 VM(例如 NC24r)上启用 RDMA 网络连接。RDMA network connectivity can be enabled on RDMA-capable N-series VMs such as NC24r deployed in the same availability set or in a single placement group in a virtual machine scale set. 必须添加 HpcVmDrivers 扩展才能安装用来启用 RDMA 连接的 Windows 网络设备驱动程序。The HpcVmDrivers extension must be added to install Windows network device drivers that enable RDMA connectivity. 若要向支持 RDMA 的 N 系列 VM 添加 VM 扩展,请使用 Azure 资源管理器的 Azure PowerShell cmdlet。To add the VM extension to an RDMA-enabled N-series VM, use Azure PowerShell cmdlets for Azure Resource Manager.

若要在“中国北部 2”区域中名为 myVM 且支持 RDMA 的现有 VM 上安装最新版本 1.1 HpcVMDrivers 扩展,请执行以下命令:To install the latest version 1.1 HpcVMDrivers extension on an existing RDMA-capable VM named myVM in the China North 2 region:

Set-AzVMExtension -ResourceGroupName "myResourceGroup" -Location "chinanorth2" -VMName "myVM" -ExtensionName "HpcVmDrivers" -Publisher "Microsoft.HpcCompute" -Type "HpcVmDrivers" -TypeHandlerVersion "1.1"

有关详细信息,请参阅适用于 Windows 的虚拟机扩展和功能For more information, see Virtual machine extensions and features for Windows.

对于使用 Microsoft MPI 或 Intel MPI 5.x 运行的应用程序,RDMA 网络支持消息传递接口 (MPI) 流量。The RDMA network supports Message Passing Interface (MPI) traffic for applications running with Microsoft MPI or Intel MPI 5.x.

后续步骤Next steps

  • 为 NVIDIA Tesla GPU 构建 GPU 加速应用程序的开发人员也可下载并安装最新的 CUDA 工具包Developers building GPU-accelerated applications for the NVIDIA Tesla GPUs can also download and install the latest CUDA Toolkit. 有关详细信息,请参阅 CUDA 安装指南For more information, see the CUDA Installation Guide.