使用 GPU 为推理部署深度学习模型Deploy a deep learning model for inference with GPU

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍如何使用 Azure 机器学习将启用 GPU 的模型部署为 Web 服务。This article teaches you how to use Azure Machine Learning to deploy a GPU-enabled model as a web service. 本文中的信息基于 Azure Kubernetes 服务 (AKS) 上的模型部署。The information in this article is based on deploying a model on Azure Kubernetes Service (AKS). AKS 群集提供模型用于推理的 GPU 资源。The AKS cluster provides a GPU resource that is used by the model for inference.

推理(模型评分)是使用部署的模型进行预测的阶段。Inference, or model scoring, is the phase where the deployed model is used to make predictions. 使用 GPU 代替 CPU 可为高度并行化的计算提供性能优势。Using GPUs instead of CPUs offers performance advantages on highly parallelizable computation.

Important

对于 Web 服务部署,只有 Azure Kubernetes 服务支持 GPU 推理。For web service deployments, GPU inference is only supported on Azure Kubernetes Service. 对于使用机器学习管道进行的推理,只有 Azure 机器学习计算支持 GPU 。For inference using a machine learning pipeline, GPUs are only supported on Azure Machine Learning Compute. 有关使用机器学习管道的详细信息,请参阅运行批量预测For more information on using ML pipelines, see Run batch predictions.

Tip

尽管本文中的代码片段使用了 TensorFlow 模型,但你可以将这些信息应用于任何支持 GPU 的机器学习框架。Although the code snippets in this article use a TensorFlow model, you can apply the information to any machine learning framework that supports GPUs.

Note

本文中的信息基于如何部署到 Azure Kubernetes 服务一文中的信息。The information in this article builds on the information in the How to deploy to Azure Kubernetes Service article. 那篇文章总体上说的是在 AKS 上部署,本文介绍的是特定于 GPU 的部署。Where that article generally covers deployment to AKS, this article covers GPU specific deployment.

必备条件Prerequisites

连接到工作区Connect to your workspace

若要连接到现有工作区,请使用以下代码:To connect to an existing workspace, use the following code:

Important

此代码片段预期工作区配置被保存到当前目录或其父项。This code snippet expects the workspace configuration to be saved in the current directory or its parent. 有关创建工作区的详细信息,请参阅创建和管理 Azure 机器学习工作区For more information on creating a workspace, see Create and manage Azure Machine Learning workspaces. 有关将配置保存到文件的详细信息,请参阅创建工作区配置文件For more information on saving the configuration to file, see Create a workspace configuration file.

from azureml.core import Workspace

# Connect to the workspace
ws = Workspace.from_config()

使用 GPU 创建 Kubernetes 群集Create a Kubernetes cluster with GPUs

Azure Kubernetes 服务提供了许多不同的 GPU 选项。Azure Kubernetes Service provides many different GPU options. 可以将其中任何一个用于模型推理。You can use any of them for model inference. 若要全面了解其功能和成本,请参阅 N 系列 VM 列表See the list of N-series VMs for a full breakdown of capabilities and costs.

以下代码演示如何为工作区创建新的 AKS 群集:The following code demonstrates how to create a new AKS cluster for your workspace:

from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "aks-gpu"

# Check to see if the cluster already exists
try:
    aks_target = ComputeTarget(workspace=ws, name=aks_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    # Provision AKS cluster with GPU machine
    prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6")

    # Create the cluster
    aks_target = ComputeTarget.create(
        workspace=ws, name=aks_name, provisioning_configuration=prov_config
    )

    aks_target.wait_for_completion(show_output=True)

Important

只要存在 AKS 群集,Azure 就会向你收费。Azure will bill you as long as the AKS cluster exists. 请务必在使用完 AKS 群集后将其删除。Make sure to delete your AKS cluster when you're done with it.

有关将 AKS 与 Azure 机器学习配合使用的详细信息,请参阅如何部署到 Azure Kubernetes 服务For more information on using AKS with Azure Machine Learning, see How to deploy to Azure Kubernetes Service.

编写入口脚本Write the entry script

入口脚本接收提交给 Web 服务的数据,将其传递给模型,并返回计分结果。The entry script receives data submitted to the web service, passes it to the model, and returns the scoring results. 下面的脚本将在启动时加载 Tensorflow 模型,然后使用该模型对数据进行评分。The following script loads the Tensorflow model on startup, and then uses the model to score data.

Tip

入口脚本特定于你的模型。The entry script is specific to your model. 例如,脚本必须知道要与模型、数据格式等一起使用的框架。For example, the script must know the framework to use with your model, data formats, etc.

import json
import numpy as np
import os
import tensorflow as tf

from azureml.core.model import Model


def init():
    global X, output, sess
    tf.reset_default_graph()
    model_root = os.getenv('AZUREML_MODEL_DIR')
    # the name of the folder in which to look for tensorflow model files
    tf_model_folder = 'model'
    saver = tf.train.import_meta_graph(
        os.path.join(model_root, tf_model_folder, 'mnist-tf.model.meta'))
    X = tf.get_default_graph().get_tensor_by_name("network/X:0")
    output = tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")

    sess = tf.Session()
    saver.restore(sess, os.path.join(model_root, tf_model_folder, 'mnist-tf.model'))


def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    out = output.eval(session=sess, feed_dict={X: data})
    y_hat = np.argmax(out, axis=1)
    return y_hat.tolist()

此文件的名称为 score.pyThis file is named score.py. 有关入口脚本的详细信息,请参阅如何以及在何处部署For more information on entry scripts, see How and where to deploy.

定义 Conda 环境Define the conda environment

Conda 环境文件指定服务的依赖项。The conda environment file specifies the dependencies for the service. 它包括模型和入口脚本都需要的依赖项。It includes dependencies required by both the model and the entry script. 请注意,必须将版本为 1.0.45 或更高版本的 azureml-defaults 指示为 pip 依赖项,因为它包含将模型托管为 Web 服务所需的功能。Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service. 以下 YAML 定义 Tensorflow 模型的环境。The following YAML defines the environment for a Tensorflow model. 它指定 tensorflow-gpu,后者会使用在此部署中使用的 GPU:It specifies tensorflow-gpu, which will make use of the GPU used in this deployment:

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  # You must list azureml-defaults as a pip dependency
  - azureml-defaults>=1.0.45
- numpy
- tensorflow-gpu=1.12
channels:
- conda-forge

在此示例中,该文件将保存为 myenv.ymlFor this example, the file is saved as myenv.yml.

定义部署配置Define the deployment configuration

部署配置定义用于运行 web 服务的 Azure Kubernetes 服务环境:The deployment configuration defines the Azure Kubernetes Service environment used to run the web service:

from azureml.core.webservice import AksWebservice

gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
                                                    num_replicas=3,
                                                    cpu_cores=2,
                                                    memory_gb=4)

有关详细信息,请参阅 AksService.deploy_configuration 的参考文档。For more information, see the reference documentation for AksService.deploy_configuration.

定义推理配置Define the inference configuration

推理配置指向入口脚本和环境对象,后者使用具有 GPU 支持的 docker 映像。The inference configuration points to the entry script and an environment object, which uses a docker image with GPU support. 请注意,用于环境定义的 YAML 文件必须将版本为 1.0.45 或更高版本的 azureml-defaults 列为 pip 依赖项,因为它包含将模型托管为 Web 服务所需的功能。Please note that the YAML file used for environment definition must list azureml-defaults with version >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service.

from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE

myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
myenv.docker.base_image = DEFAULT_GPU_IMAGE
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

有关环境的详细信息,请参阅创建和管理用于训练和部署的环境For more information on environments, see Create and manage environments for training and deployment. 有关详细信息,请参阅 InferenceConfig 的参考文档。For more information, see the reference documentation for InferenceConfig.

部署模型Deploy the model

将模型部署到 AKS 群集,并等待其创建服务。Deploy the model to your AKS cluster and wait for it to create your service.

from azureml.core.model import Model

# Name of the web service that is deployed
aks_service_name = 'aks-dnn-mnist'
# Get the registerd model
model = Model(ws, "tf-dnn-mnist")
# Deploy the model
aks_service = Model.deploy(ws,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=gpu_aks_config,
                           deployment_target=aks_target,
                           name=aks_service_name)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

Note

如果 InferenceConfig 对象具有 enable_gpu=True,则 deployment_target 参数必须引用提供 GPU 的群集。If the InferenceConfig object has enable_gpu=True, then the deployment_target parameter must reference a cluster that provides a GPU. 否则,部署会失败。Otherwise, the deployment will fail.

有关详细信息,请参阅模型的参考文档。For more information, see the reference documentation for Model.

向服务发出示例查询Issue a sample query to your service

将测试查询发送到已部署的模型。Send a test query to the deployed model. 将 jpeg 图像发送到模型时,它会对该图像进行评分。When you send a jpeg image to the model, it scores the image. 下面的代码示例下载测试数据,然后选择要发送到服务的随机测试映像。The following code sample downloads test data and then selects a random test image to send to the service.

# Used to test your webservice
import os
import urllib
import gzip
import numpy as np
import struct
import requests

# load compressed MNIST gz files and return numpy arrays
def load_data(filename, label=False):
    with gzip.open(filename) as gz:
        struct.unpack('I', gz.read(4))
        n_items = struct.unpack('>I', gz.read(4))
        if not label:
            n_rows = struct.unpack('>I', gz.read(4))[0]
            n_cols = struct.unpack('>I', gz.read(4))[0]
            res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
            res = res.reshape(n_items[0], n_rows * n_cols)
        else:
            res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
            res = res.reshape(n_items[0], 1)
    return res

# one-hot encode a 1-D array
def one_hot_encode(array, num_of_classes):
    return np.eye(num_of_classes)[array.reshape(-1)]

# Download test data
os.makedirs('./data/mnist', exist_ok=True)
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename='./data/mnist/test-images.gz')
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename='./data/mnist/test-labels.gz')

# Load test data from model training
X_test = load_data('./data/mnist/test-images.gz', False) / 255.0
y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)

# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"

api_key = aks_service.get_keys()[0]
headers = {'Content-Type': 'application/json',
           'Authorization': ('Bearer ' + api_key)}
resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)

print("POST to url", aks_service.scoring_uri)
print("label:", y_test[random_index])
print("prediction:", resp.text)

有关创建客户端应用程序的详细信息,请参阅创建客户端以使用部署的 Web 服务For more information on creating a client application, see Create client to consume deployed web service.

清理资源Clean up the resources

如果专门为此示例创建了 AKS 群集,请在使用完后删除资源。If you created the AKS cluster specifically for this example, delete your resources after you're done.

Important

Azure 会根据 AKS 群集的部署时长来收费。Azure bills you based on how long the AKS cluster is deployed. 使用完后,请务必将其清除。Make sure to clean it up after you are done with it.

aks_service.delete()
aks_target.delete()

后续步骤Next steps