在 Azure Kubernetes 服务 (AKS) 上将 GPU 用于计算密集型工作负荷Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS)

图形处理单元 (GPU) 通常用于计算密集型工作负荷,例如图形和可视化工作负荷。Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads. AKS 支持创建启用 GPU 的节点池,以在 Kubernetes 中运行这些计算密集型工作负荷。AKS supports the creation of GPU-enabled node pools to run these compute-intensive workloads in Kubernetes. 有关可用的启用了 GPU 的 VM 的详细信息,请参阅 Azure 中 GPU 优化 VM 的大小For more information on available GPU-enabled VMs, see GPU optimized VM sizes in Azure. 对于 AKS 节点,我们建议最小大小为“Standard_NC6s_v3” 。For AKS nodes, we recommend a minimum size of Standard_NC6s_v3.

备注

启用 GPU 的 VM 包含专用硬件,这些硬件定价较高,其可用性受区域限制。GPU-enabled VMs contain specialized hardware that is subject to higher pricing and region availability. 有关详细信息,请参阅定价工具和区域可用性For more information, see the pricing tool and region availability.

目前,使用支持 GPU 的节点池这一功能仅适用于 Linux 节点池。Currently, using GPU-enabled node pools is only available for Linux node pools.

“在 Azure Kubernetes 服务 (AKS) 上将 GPU 用于计算密集型工作负荷”以详细步骤的形式介绍了如何在 AKS 群集上运行 GPU 工作负荷,但需在 Azure 中国区更改某些配置。Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS) provides detailed steps about how to run GPU workloads on AKS cluster, while there are some configurations needed to change on Azure China. 例如,以下 docker 中心映像应该进行更改,以便使用 dockerhub.azk8s.cne.g. following docker hub images should be changed to use dockerhub.azk8s.cn:

doc 中的原始映像original image in doc Azure 中国区支持的映像supported images on Azure China
k8s-device-plugin:1.11k8s-device-plugin:1.11 dockerhub.azk8s.cn/nvidia/k8s-device-plugin:1.11dockerhub.azk8s.cn/nvidia/k8s-device-plugin:1.11
microsoft/samples-tf-mnist-demo:gpumicrosoft/samples-tf-mnist-demo:gpu dockerhub.azk8s.cn/microsoft/samples-tf-mnist-demo:gpudockerhub.azk8s.cn/microsoft/samples-tf-mnist-demo:gpu

下面以详细步骤的形式介绍了如何在 Azure 中国区 AKS 群集上运行 GPU 工作负荷:Below are detailed steps about how to run GPU workload on Azure China AKS cluster:

开始之前Before you begin

本文假定你拥有现有的 AKS 群集,其中包含支持 GPU 的节点。This article assumes that you have an existing AKS cluster with nodes that support GPUs. AKS 群集须运行 Kubernetes 1.10 或更高版本。Your AKS cluster must run Kubernetes 1.10 or later. 如果需要满足这些要求的 AKS 群集,请参阅本文第一部分来创建 AKS 群集If you need an AKS cluster that meets these requirements, see the first section of this article to create an AKS cluster.

还需安装并配置 Azure CLI 2.0.64 或更高版本。You also need the Azure CLI version 2.0.64 or later installed and configured. 运行  az --version 即可查找版本。Run az --version to find the version. 如果需要进行安装或升级,请参阅 安装 Azure CLIIf you need to install or upgrade, see Install Azure CLI.

确保 Azure 订阅可以创建 NCv3 串行 VM(例如 Standard_NC6s_v3),否则,你可以提交一个支持票证,以便为 Azure 订阅启用该类型的 VM 大小。Make sure your Azure subscription could create NCv3 serial VMs, e.g. Standard_NC6s_v3, otherwise you may file a support ticket to enable that kind of VM size for your Azure subscription.

创建 AKS 群集Create an AKS cluster

如果需要可满足最低要求(启用了 GPU 的节点和 Kubernetes 版本 1.10 或更高版本)的 AKS 群集,请完成以下步骤。If you need an AKS cluster that meets the minimum requirements (GPU-enabled node and Kubernetes version 1.10 or later), complete the following steps. 如果已拥有满足这些要求的 AKS 群集,请跳至下一部分If you already have an AKS cluster that meets these requirements, skip to the next section.

首先,使用 az group create 命令为群集创建资源组。First, create a resource group for the cluster using the az group create command. 以下示例在 chinaeast2 区域创建名为 myResourceGroup 的资源组:The following example creates a resource group name myResourceGroup in the chinaeast2 region:

az group create --name myResourceGroup --location chinaeast2

现在,使用 az aks create 命令创建 AKS 群集。Now create an AKS cluster using the az aks create command. 以下示例会创建具有一个节点(大小为 Standard_NC6s_v3)的群集:The following example creates a cluster with a single node of size Standard_NC6s_v3:

az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-vm-size Standard_NC6s_v3 \
    --node-count 1

使用 az aks get-credentials 命令获取 AKS 群集的凭据:Get the credentials for your AKS cluster using the az aks get-credentials command:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

安装 NVIDIA 设备插件Install NVIDIA device plugin

在使用节点中的 GPU 之前,必须为 NVIDIA 设备插件部署 DaemonSet。Before the GPUs in the nodes can be used, you must deploy a DaemonSet for the NVIDIA device plugin. 此 DaemonSet 在会每个节点上运行 pod,以便为 GPU 提供所需驱动程序。This DaemonSet runs a pod on each node to provide the required drivers for the GPUs.

首先,使用 kubectl create namespace 命令创建命名空间,例如“gpu-resources” :First, create a namespace using the kubectl create namespace command, such as gpu-resources:

kubectl create namespace gpu-resources

创建名为“nvidia-device-plugin-ds.yam”的文件并粘贴以下 YAML 清单 。Create a file named nvidia-device-plugin-ds.yaml and paste the following YAML manifest. 此清单作为 Kubernetes 项目的 NVIDIA 设备插件的一部分提供。This manifest is provided as part of the NVIDIA device plugin for Kubernetes project.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: gpu-resources
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
      # reserves resources for critical add-on pods so that they can be rescheduled after
      # a failure.  This annotation works in tandem with the toleration below.
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
      # This, along with the annotation above marks this pod as a critical add-on.
      - key: CriticalAddonsOnly
        operator: Exists
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      containers:
      - image: dockerhub.azk8s.cn/nvidia/k8s-device-plugin:1.11
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
          - name: device-plugin
            mountPath: /var/lib/kubelet/device-plugins
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins

现在使用 kubectl apply 命令创建 DaemonSet 并确认 NVIDIA 设备插件是否已成功创建,如以下示例输出所示:Now use the kubectl apply command to create the DaemonSet and confirm the NVIDIA device plugin is created successfully, as shown in the following example output:

$ kubectl apply -f nvidia-device-plugin-ds.yaml

daemonset "nvidia-device-plugin" created

确认 GPU 是可计划的Confirm that GPUs are schedulable

创建 AKS 群集后,确认 GPU 在 Kubernetes 中是可计划的。With your AKS cluster created, confirm that GPUs are schedulable in Kubernetes. 首先,使用 kubectl get nodes 命令列出群集中的节点:First, list the nodes in your cluster using the kubectl get nodes command:

$ kubectl get nodes

NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-28993262-0   Ready    agent   13m   v1.12.7

现在,使用 kubectl describe node 命令确认 GPU 是可计划的。Now use the kubectl describe node command to confirm that the GPUs are schedulable. 在“容量”部分下,GPU 应列为 nvidia.com/gpu: 1Under the Capacity section, the GPU should list as nvidia.com/gpu: 1.

以下精简示例显示了 GPU 在名为“aks-nodepool1-18821093-0”的节点上可用 :The following condensed example shows that a GPU is available on the node named aks-nodepool1-18821093-0:

$ kubectl describe node aks-nodepool1-28993262-0

Name:               aks-nodepool1-28993262-0
Roles:              agent
Labels:             accelerator=nvidia

[...]

Capacity:
 attachable-volumes-azure-disk:  24
 cpu:                            6
 ephemeral-storage:              101584140Ki
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         57713784Ki
 nvidia.com/gpu:                 1
 pods:                           110
Allocatable:
 attachable-volumes-azure-disk:  24
 cpu:                            5916m
 ephemeral-storage:              93619943269
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         51702904Ki
 nvidia.com/gpu:                 1
 pods:                           110
System Info:
 Machine ID:                 b0cd6fb49ffe4900b56ac8df2eaa0376
 System UUID:                486A1C08-C459-6F43-AD6B-E9CD0F8AEC17
 Boot ID:                    f134525f-385d-4b4e-89b8-989f3abb490b
 Kernel Version:             4.15.0-1040-azure
 OS Image:                   Ubuntu 16.04.6 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.12.7
 Kube-Proxy Version:         v1.12.7
PodCIDR:                     10.244.0.0/24
ProviderID:                  azure:///subscriptions/<guid>/resourceGroups/MC_myResourceGroup_myAKSCluster_chinaeast2/providers/Microsoft.Compute/virtualMachines/aks-nodepool1-28993262-0
Non-terminated Pods:         (9 in total)
  Namespace                  Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                     ------------  ----------  ---------------  -------------  ---
  kube-system                nvidia-device-plugin-daemonset-bbjlq     0 (0%)        0 (0%)      0 (0%)           0 (0%)         2m39s

[...]

安装 GPU 插件Install GPU plugin

kubectl create -f https://raw.githubusercontent.com/andyzhangx/demo/master/linux/gpu/nvidia-device-plugin-ds-mooncake.yaml

运行启用了 GPU 的工作负荷Run a GPU-enabled workload

若要查看 GPU 的运行情况,请通过相应的资源请求计划启用了 GPU 的工作负荷。To see the GPU in action, schedule a GPU-enabled workload with the appropriate resource request. 在此示例中,我们针对 MNIST 数据集运行一个 Tensorflow 作业。In this example, let's run a Tensorflow job against the MNIST dataset.

创建名为“samples-tf-mnist-demo.yaml”的文件并粘贴以下 YAML 清单 。Create a file named samples-tf-mnist-demo.yaml and paste the following YAML manifest. 以下作业清单包括资源限制 nvidia.com/gpu: 1The following job manifest includes a resource limit of nvidia.com/gpu: 1:

备注

如果在调用驱动程序时收到版本不匹配错误,例如,CUDA 驱动程序版本不足以支持 CUDA 运行时版本,请查看 NVIDIA 驱动程序矩阵兼容性图表 - https://docs.nvidia.com/deploy/cuda-compatibility/index.htmlIf you receive a version mismatch error when calling into drivers, such as, CUDA driver version is insufficient for CUDA runtime version, review the NVIDIA driver matrix compatibility chart - https://docs.nvidia.com/deploy/cuda-compatibility/index.html

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: samples-tf-mnist-demo
  name: samples-tf-mnist-demo
spec:
  template:
    metadata:
      labels:
        app: samples-tf-mnist-demo
    spec:
      containers:
      - name: samples-tf-mnist-demo
        image: dockerhub.azk8s.cn/microsoft/samples-tf-mnist-demo:gpu
        args: ["--max_steps", "500"]
        imagePullPolicy: IfNotPresent
        resources:
          limits:
           nvidia.com/gpu: 1
      restartPolicy: OnFailure

使用 kubectl apply 命令运行该作业。Use the kubectl apply command to run the job. 此命令分析清单文件并创建定义的 Kubernetes 对象:This command parses the manifest file and creates the defined Kubernetes objects:

kubectl apply -f samples-tf-mnist-demo.yaml

Or

kubectl create -f https://raw.githubusercontent.com/andyzhangx/demo/master/linux/gpu/gpu-demo-mooncake.yaml

查看启用了 GPU 的工作负荷的状态和输出View the status and output of the GPU-enabled workload

kubectl get jobs 命令与 --watch 参数配合使用,监视作业的进度。Monitor the progress of the job using the kubectl get jobs command with the --watch argument. 先拉取映像并处理数据集可能需要几分钟时间。It may take a few minutes to first pull the image and process the dataset. 当“COMPLETIONS”列显示“1/1”时,作业便已成功完成 。When the COMPLETIONS column shows 1/1, the job has successfully finished. 使用 kubetctl --watchCtrl-C退出 命令:Exit the kubetctl --watch command with Ctrl-C:

$ kubectl get jobs samples-tf-mnist-demo --watch

NAME                    COMPLETIONS   DURATION   AGE

samples-tf-mnist-demo   0/1           3m29s      3m29s
samples-tf-mnist-demo   1/1   3m10s   3m36s

若要查看启用了 GPU 的工作负荷的输出,首先请使用 kubectl get pods 命令获取 Pod 名称:To look at the output of the GPU-enabled workload, first get the name of the pod with the kubectl get pods command:

$ kubectl get pods --selector app=samples-tf-mnist-demo

NAME                          READY   STATUS      RESTARTS   AGE
samples-tf-mnist-demo-mtd44   0/1     Completed   0          4m39s

现在,使用 kubectl logs 命令查看 Pod 日志。Now use the kubectl logs command to view the pod logs. 以下示例 pod 日志确认已发现适当的 GPU 设备,即 Tesla K80The following example pod logs confirm that the appropriate GPU device has been discovered, Tesla K80. 为自己的 pod 提供名称:Provide the name for your own pod:

$ kubectl logs samples-tf-mnist-demo-smnr6

2019-05-16 16:08:31.258328: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-16 16:08:31.396846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 2fd7:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-05-16 16:08:31.396886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 2fd7:00:00.0, compute capability: 3.7)
2019-05-16 16:08:36.076962: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/input_data/t10k-labels-idx1-ubyte.gz
Accuracy at step 0: 0.1081
Accuracy at step 10: 0.7457
Accuracy at step 20: 0.8233
Accuracy at step 30: 0.8644
Accuracy at step 40: 0.8848
Accuracy at step 50: 0.8889
Accuracy at step 60: 0.8898
Accuracy at step 70: 0.8979
Accuracy at step 80: 0.9087
Accuracy at step 90: 0.9099
Adding run metadata for 99
Accuracy at step 100: 0.9125
Accuracy at step 110: 0.9184
Accuracy at step 120: 0.922
Accuracy at step 130: 0.9161
Accuracy at step 140: 0.9219
Accuracy at step 150: 0.9151
Accuracy at step 160: 0.9199
Accuracy at step 170: 0.9305
Accuracy at step 180: 0.9251
Accuracy at step 190: 0.9258
Adding run metadata for 199
Accuracy at step 200: 0.9315
Accuracy at step 210: 0.9361
Accuracy at step 220: 0.9357
Accuracy at step 230: 0.9392
Accuracy at step 240: 0.9387
Accuracy at step 250: 0.9401
Accuracy at step 260: 0.9398
Accuracy at step 270: 0.9407
Accuracy at step 280: 0.9434
Accuracy at step 290: 0.9447
Adding run metadata for 299
Accuracy at step 300: 0.9463
Accuracy at step 310: 0.943
Accuracy at step 320: 0.9439
Accuracy at step 330: 0.943
Accuracy at step 340: 0.9457
Accuracy at step 350: 0.9497
Accuracy at step 360: 0.9481
Accuracy at step 370: 0.9466
Accuracy at step 380: 0.9514
Accuracy at step 390: 0.948
Adding run metadata for 399
Accuracy at step 400: 0.9469
Accuracy at step 410: 0.9489
Accuracy at step 420: 0.9529
Accuracy at step 430: 0.9507
Accuracy at step 440: 0.9504
Accuracy at step 450: 0.951
Accuracy at step 460: 0.9512
Accuracy at step 470: 0.9539
Accuracy at step 480: 0.9533
Accuracy at step 490: 0.9494
Adding run metadata for 499

清理资源Clean up resources

若要删除本文中创建的相关 Kubernetes 对象,请使用 kubectl delete job 命令,如下所示:To remove the associated Kubernetes objects created in this article, use the kubectl delete job command as follows:

kubectl delete jobs samples-tf-mnist-demo

后续步骤Next steps

若要运行 Apache Spark 作业,请参阅在 AKS 上运行 Apache Spark 作业To run Apache Spark jobs, see Run Apache Spark jobs on AKS.

有关在 Kubernetes 上运行机器学习 (ML) 工作负荷的更多信息,请参阅 Kubeflow 实验室For more information about running machine learning (ML) workloads on Kubernetes, see Kubeflow Labs.