使用 GPU 为推理部署深度学习模型

项目
08/18/2023

本文介绍如何使用 Azure 机器学习将启用 GPU 的模型部署为 Web 服务。本文中的信息基于 Azure Kubernetes 服务 (AKS) 上的模型部署。 AKS 群集提供模型用于推理的 GPU 资源。

推理（模型评分）是使用部署的模型进行预测的阶段。使用 GPU 代替 CPU 可为高度并行化的计算提供性能优势。

注意

Azure 机器学习终结点 (v2) 提供了经过改进的简化部署体验。终结点同时支持实时和批量推理场景。终结点提供了一个统一的接口，以用于跨计算类型调用和管理模型部署。请参阅什么是 Azure 机器学习终结点？。

重要

对于 Web 服务部署，只有 Azure Kubernetes 服务支持 GPU 推理。对于使用机器学习管道进行的推理，只有 Azure 机器学习计算支持 GPU。有关使用 ML 管道的详细信息，请参阅教程：生成 Azure 机器学习管道以用于批量评分中的说明操作。

提示

尽管本文中的代码片段使用了 TensorFlow 模型，但你可以将这些信息应用于任何支持 GPU 的机器学习框架。

注意

本文中的信息基于如何部署到 Azure Kubernetes 服务一文中的信息。那篇文章总体上说的是在 AKS 上部署，本文介绍的是特定于 GPU 的部署。

先决条件

Azure 机器学习工作区。有关详细信息，请参阅创建 Azure 机器学习工作区。
安装了 Azure 机器学习 SDK 的 Python 开发环境。有关详细信息，请参阅 Azure 机器学习 SDK。
使用 GPU 的已注册的模型。
- 若要了解如何注册模型，请参阅部署模型。
- 若要创建并注册此文档创建时所用的 Tensorflow 模型，请参阅如何训练 TensorFlow 模型。
大致了解如何以及在何处部署模型。

连接到工作区

若要连接到现有工作区，请使用以下代码：

重要

此代码片段需要将工作区配置保存到当前目录或其父目录中。若要详细了解如何创建工作区，请参阅创建工作区资源。有关将配置保存到文件的详细信息，请参阅创建工作区配置文件。

from azureml.core import Workspace

# Connect to the workspace
ws = Workspace.from_config()

使用 GPU 创建 Kubernetes 群集

Azure Kubernetes 服务提供了许多不同的 GPU 选项。可以将其中任何一个用于模型推理。若要全面了解其功能和成本，请参阅 N 系列 VM 列表。

以下代码演示如何为工作区创建新的 AKS 群集：

from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "aks-gpu"

# Check to see if the cluster already exists
try:
    aks_target = ComputeTarget(workspace=ws, name=aks_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    # Provision AKS cluster with GPU machine
    prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6")

    # Create the cluster
    aks_target = ComputeTarget.create(
        workspace=ws, name=aks_name, provisioning_configuration=prov_config
    )

    aks_target.wait_for_completion(show_output=True)

重要

只要存在 AKS 群集，Azure 就会向你收费。请务必在使用完 AKS 群集后将其删除。

有关将 AKS 与 Azure 机器学习配合使用的详细信息，请参阅如何部署到 Azure Kubernetes 服务。

编写入口脚本

入口脚本接收提交给 Web 服务的数据，将其传递给模型，并返回计分结果。下面的脚本将在启动时加载 Tensorflow 模型，然后使用该模型对数据进行评分。

提示

入口脚本特定于你的模型。例如，脚本必须知道要与模型、数据格式等一起使用的框架。

import json
import numpy as np
import os
import tensorflow as tf

from azureml.core.model import Model


def init():
    global X, output, sess
    tf.reset_default_graph()
    model_root = os.getenv('AZUREML_MODEL_DIR')
    # the name of the folder in which to look for tensorflow model files
    tf_model_folder = 'model'
    saver = tf.train.import_meta_graph(
        os.path.join(model_root, tf_model_folder, 'mnist-tf.model.meta'))
    X = tf.get_default_graph().get_tensor_by_name("network/X:0")
    output = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")

    sess = tf.Session()
    saver.restore(sess, os.path.join(model_root, tf_model_folder, 'mnist-tf.model'))


def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    out = output.eval(session=sess, feed_dict={X: data})
    y_hat = np.argmax(out, axis=1)
    return y_hat.tolist()

此文件的名称为 score.py。有关入口脚本的详细信息，请参阅如何以及在何处部署。

定义 Conda 环境

Conda 环境文件指定服务的依赖项。它包括模型和入口脚本都需要的依赖项。请注意，必须将版本为 1.0.45 或更高版本的 azureml-defaults 指示为 pip 依赖项，因为它包含将模型托管为 Web 服务所需的功能。以下 YAML 定义 Tensorflow 模型的环境。它指定 tensorflow-gpu，后者会使用在此部署中使用的 GPU：

name: project_environment
dependencies:
  # The Python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  # You must list azureml-defaults as a pip dependency
  - azureml-defaults>=1.0.45
  - numpy
  - tensorflow-gpu=1.12
channels:
- conda-forge

在此示例中，该文件将保存为 myenv.yml。

定义部署配置

重要

AKS 不允许 Pod 共享 GPU，支持 GPU 的 Web 服务的副本数只能与群集中的 GPU 数相同。

部署配置定义用于运行 web 服务的 Azure Kubernetes 服务环境：

from azureml.core.webservice import AksWebservice

gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
                                                    num_replicas=3,
                                                    cpu_cores=2,
                                                    memory_gb=4)

有关详细信息，请参阅 AksService.deploy_configuration 的参考文档。

定义推理配置

推理配置指向入口脚本和环境对象，后者使用具有 GPU 支持的 docker 映像。请注意，用于环境定义的 YAML 文件必须将版本为 1.0.45 或更高版本的 azureml-defaults 列为 pip 依赖项，因为它包含将模型托管为 Web 服务所需的功能。

from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE

myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
myenv.docker.base_image = DEFAULT_GPU_IMAGE
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

有关环境的详细信息，请参阅创建和管理用于训练和部署的环境。有关详细信息，请参阅 InferenceConfig 的参考文档。

部署模型

将模型部署到 AKS 群集，并等待其创建服务。

from azureml.core.model import Model

# Name of the web service that is deployed
aks_service_name = 'aks-dnn-mnist'
# Get the registerd model
model = Model(ws, "tf-dnn-mnist")
# Deploy the model
aks_service = Model.deploy(ws,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=gpu_aks_config,
                           deployment_target=aks_target,
                           name=aks_service_name)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

有关详细信息，请参阅模型的参考文档。

向服务发出示例查询

将测试查询发送到已部署的模型。将 jpeg 图像发送到模型时，它会对该图像进行评分。下面的代码示例下载测试数据，然后选择要发送到服务的随机测试映像。

# Used to test your webservice
import os
import urllib
import gzip
import numpy as np
import struct
import requests

# load compressed MNIST gz files and return numpy arrays
def load_data(filename, label=False):
    with gzip.open(filename) as gz:
        struct.unpack('I', gz.read(4))
        n_items = struct.unpack('>I', gz.read(4))
        if not label:
            n_rows = struct.unpack('>I', gz.read(4))[0]
            n_cols = struct.unpack('>I', gz.read(4))[0]
            res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
            res = res.reshape(n_items[0], n_rows * n_cols)
        else:
            res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
            res = res.reshape(n_items[0], 1)
    return res

# one-hot encode a 1-D array
def one_hot_encode(array, num_of_classes):
    return np.eye(num_of_classes)[array.reshape(-1)]

# Download test data
os.makedirs('./data/mnist', exist_ok=True)
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/mnist/test-images.gz')
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/mnist/test-labels.gz')

# Load test data from model training
X_test = load_data('./data/mnist/test-images.gz', False) / 255.0
y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)

# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"

api_key = aks_service.get_keys()[0]
headers = {'Content-Type': 'application/json',
           'Authorization': ('Bearer ' + api_key)}
resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)

print("POST to url", aks_service.scoring_uri)
print("label:", y_test[random_index])
print("prediction:", resp.text)

有关创建客户端应用程序的详细信息，请参阅创建客户端以使用部署的 Web 服务。

清理资源

如果专门为此示例创建了 AKS 群集，请在使用完后删除资源。

重要

Azure 会根据 AKS 群集的部署时长来收费。使用完后，请务必将其清除。

aks_service.delete()
aks_target.delete()

通过