使用批量部署的模型进行图像处理

适用于：Azure CLI ml 扩展 v2 （当前版本）Python SDK azure-ai-ml v2 （当前版本）

批量模型部署可用于处理表格数据，也可用于处理图像等任何其他文件类型。 MLflow 和自定义模型都支持这些部署。本文介绍如何根据 ImageNet 分类部署图像分类模型。

先决条件

一个 Azure 订阅。如果没有Azure订阅，请在开始前创建 Trial。
Azure Machine Learning工作区。若要创建工作区，请参阅 Manage Azure Machine Learning 工作区。
Azure Machine Learning工作区中的以下权限：
- 若要创建或管理批处理终结点和部署：使用具有Microsoft.MachineLearningServices/workspaces/batchEndpoints/*权限的所有者、参与者或自定义角色。
- 若要在工作区资源组中创建Azure Resource Manager部署：请在部署工作区的资源组中使用具有Microsoft.Resources/deployments/write权限的所有者、参与者或自定义角色。
用于Python的 Azure Machine Learning CLI 或 Azure Machine Learning SDK：
- Azure CLI
- Python
运行以下命令安装 Azure CLI 和 mlextension for Azure Machine Learning：
```
az extension add -n ml
```
批处理终结点的管道组件部署需要Azure CLI的 ml 扩展版本 2.7 或更高版本（当前版本：2.37.0）。使用 az extension update --name ml 命令获取最新版本。
运行以下命令以安装适用于 PythonAzure Machine Learning SDK>：
```
pip install azure-ai-ml
```
这些 ModelBatchDeployment 和 PipelineComponentBatchDeployment 类需要 SDK 版本 1.7.0 或更高版本（当前版本：1.32.0）。使用 pip install -U azure-ai-ml 命令获取最新版本。

连接到工作区

工作区是Azure Machine Learning的顶级资源。它提供了一个集中的位置，用于处理使用Azure Machine Learning时创建的所有项目。在本部分，你将连接到要在其中执行部署任务的工作区。

Azure CLI
Python

在以下命令中，输入你的订阅 ID、工作区名称、资源组名称以及位置：

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

导入所需的库：

from azure.ai.ml import MLClient, Input, load_component
from azure.ai.ml.entities import BatchEndpoint, ModelBatchDeployment, ModelBatchDeploymentSettings, PipelineComponentBatchDeployment, Model, AmlCompute, Data, BatchRetrySettings, CodeConfiguration, Environment, Data
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.ai.ml.dsl import pipeline
from azure.identity import DefaultAzureCredential

配置工作区详细信息并获取工作区的句柄：

在以下命令中，输入你的订阅 ID、资源组名称和工作区名称：

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

关于此示例

本文使用的模型使用 TensorFlow 和 RestNet 体系结构构建。有关详细信息，请参阅深度残差网络中的标识映射。您可以下载 https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/model.zip。该模型有以下约束：

它适用于大小为 244x244（张量为 (224, 224, 3)）的图像。
要求将输入缩放到范围 [0,1] 内。

本文中的信息基于 azureml-examples 存储库中包含的代码示例。若要在不复制/粘贴 YAML 和其他文件的情况下在本地运行命令，请克隆存储库。请将目录更改为 cli/endpoints/batch/deploy-models/imagenet-classifier 如果您使用的是 Azure CLI，或者将目录更改为 sdk/python/endpoints/batch/deploy-models/imagenet-classifier 如果您使用的是 SDK for Python。

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli/endpoints/batch/deploy-models/imagenet-classifier

在 Jupyter Notebooks 中继续操作

可以在Jupyter Notebook中遵循此示例。在克隆的存储库中，打开笔记本：imagenet-classifier-batch.ipynb。

利用批部署进行图像分类

此示例介绍如何部署可根据 ImageNet 分类对给定图像进行分类的深度学习模型。

创建终结点

创建托管模型的终结点：

Azure CLI
Python

指定终结点的名称。

ENDPOINT_NAME="imagenet-classifier-batch"

创建名为 endpoint.yml 的以下 YAML 文件来定义批处理终结点：

$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: imagenet-classifier-batch
description: A batch endpoint for performing image classification using a TFHub model ImageNet model.
auth_mode: aad_token

若要创建终结点，请运行以下代码：

az ml batch-endpoint create --file endpoint.yml  --name $ENDPOINT_NAME

指定终结点的名称。

endpoint_name="imagenet-classifier-batch"

配置终结点。

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An batch service to perform ImageNet image classification",
)

若要创建终结点，请运行以下代码：

ml_client.batch_endpoints.begin_create_or_update(endpoint)

注册模型

模型部署只能部署已注册的模型。需要注册模型。如果打算部署的模型已注册，则可以跳过此步骤。

下载模型副本。

Azure CLI
Python

wget https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/model.zip
mkdir -p imagenet-classifier
unzip model.zip -d imagenet-classifier

import os
import urllib.request
from zipfile import ZipFile

response = urllib.request.urlretrieve('https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/model.zip', 'model.zip')

os.mkdirs("imagenet-classifier", exits_ok=True)
with ZipFile(response[0], 'r') as zip:
  model_path = zip.extractall(path="imagenet-classifier")

注册模型。

Azure CLI
Python

MODEL_NAME='imagenet-classifier'
az ml model create --name $MODEL_NAME --path "model"

model_name = 'imagenet-classifier'
model = ml_client.models.create_or_update(
    Model(name=model_name, path=model_path, type=AssetTypes.CUSTOM_MODEL)
)

创建评分脚本

创建评分脚本，该脚本能够读取批量部署提供的图像并返回模型的分数。

init 方法使用 keras 中的 tensorflow 模块来加载模型。
run 方法针对批处理部署提供的每个微批运行。
run 方法每次读取文件的一张图像。
该run方法可将图像大小调整为模型的预期大小。
该run方法将图像重新缩放到范围[0,1]域，即模型的预期内容。
脚本会返回与预测有关的类和概率。

此代码是 code/score-by-file/batch_driver.py 文件：

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from os.path import basename
from PIL import Image
from tensorflow.keras.models import load_model


def init():
    global model
    global input_width
    global input_height

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)
    input_width = 244
    input_height = 244


def run(mini_batch):
    results = []

    for image in mini_batch:
        data = Image.open(image).resize(
            (input_width, input_height)
        )  # Read and resize the image
        data = np.array(data) / 255.0  # Normalize
        data_batch = tf.expand_dims(
            data, axis=0
        )  # create a batch of size (1, 244, 244, 3)

        # perform inference
        pred = model.predict(data_batch)

        # Compute probabilities, classes and labels
        pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
        pred_class = tf.math.argmax(pred, axis=-1).numpy()

        results.append([basename(image), pred_class[0], pred_prob])

    return pd.DataFrame(results)

提示

尽管图像是通过部署按小型批次提供的，但此评分程序一次仅处理一个图像。这是一种常见模式，尝试加载整个批处理并将其一次性发送到模型可能会导致批处理执行程序（OOM 异常）的内存压力增大。

在某些情况下，这样做可以实现评分任务的高吞吐量。通过 GPU 硬件进行批量部署就是这种情况，即需要实现高 GPU 利用率。有关利用此方法的评分脚本，请参阅高吞吐量部署。

注意

若要部署一个会生成文件的生成式模型，请了解如何创作评分脚本：在批量部署中自定义输出。

创建部署

创建评分脚本后，请为其创建批量部署。请按以下过程操作：

确保创建了一个可在其中创建部署的计算群集。本示例使用名为 gpu-cluster 的计算群集。使用 GPU 可加快处理速度，尽管这不是必需的。

指示在哪个环境中运行部署。在此示例中，模型在 TensorFlow 上运行。 Azure Machine Learning已安装所需软件的环境，因此可以重复使用此环境。需要在 conda.yml 文件中添加几个依赖项。

Azure CLI
Python

环境定义将包含在部署文件中。

compute: azureml:gpu-cluster
environment:
  name: tensorflow27-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest

获取对环境的引用。

environment = Environment(
    name="tensorflow27-cuda11-gpu",
    conda_file="environment/conda.yml",
    image="mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest",
)

创建部署。

Azure CLI
Python

如需在创建的终结点下创建新部署，请创建如以下示例所示的 YAML 配置。对于其他属性，请参阅完整批处理终结点 YAML 架构。

$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
endpoint_name: imagenet-classifier-batch
name: imagenet-classifier-resnetv2
description: A ResNetV2 model architecture for performing ImageNet classification in batch
type: model
model: azureml:imagenet-classifier@latest
compute: azureml:gpu-cluster
environment:
  name: tensorflow27-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest
  conda_file: environment/conda.yaml
code_configuration:
  code: code/score-by-file
  scoring_script: batch_driver.py
resources:
  instance_count: 2
settings:
  max_concurrency_per_instance: 1
  mini_batch_size: 5
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 300
 error_threshold: -1
 logging_level: info

使用以下命令创建部署：

DEPLOYMENT_NAME="imagenet-classifier-resnetv2"
az ml batch-deployment create -f deployment.yml

如需用所示环境和评分脚本创建新部署，请使用以下代码：

deployment = BatchDeployment(
    name="imagenet-classifier-resnetv2",
    description="A ResNetV2 model architecture for performing ImageNet classification in batch",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="code/score-by-file",
        scoring_script="batch_driver.py",
    ),
    compute=compute_name,
    instance_count=2,
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)

使用以下命令创建部署：

ml_client.batch_deployments.begin_create_or_update(deployment)

尽管可以在终结点内部调用特定部署，但通常需要调用终结点本身，并支持终结点决定使用哪个部署。此类部署称为默认部署。

这种方法允许更改默认部署，从而更改处理部署的模型，而无需更改与调用终结点的用户之间的协定。使用以下代码更新默认部署：
- Azure Machine Learning CLI
- Azure Machine Learning SDK for Python
```
az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME
```
```
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint)
```

批处理终结点随时可用。

测试部署

为测试终结点，请使用原始 ImageNet 数据集中的 1,000 个图像示例。 Batch 终结点只能处理位于云中且可从Azure Machine Learning工作区访问的数据。将其上传到Azure Machine Learning数据存储。创建可用于调用终结点进行评分的数据资产。

注意

批处理终结点接受可存放在多种类型的位置的数据。

下载关联的示例数据。

Azure CLI
Python

wget https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/imagenet-1000.zip
unzip imagenet-1000.zip -d data

注意

如果没有在本地安装 wget，请安装它或使用浏览器获取 .zip 文件。

!wget https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet-1000.zip
!unzip imagenet-1000.zip -d data

用下载的数据创建数据资产。

Azure CLI
Python

在名为 YAML 的文件中创建数据资产定义：

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: imagenet-sample-unlabeled
description: A sample of 1000 images from the original ImageNet dataset. Download content from https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet-1000.zip.
type: uri_folder
path: data

创建数据资产。

az ml data create -f imagenet-sample-unlabeled.yml

指定以下值：

data_path = "data"
dataset_name = "imagenet-sample-unlabeled"

imagenet_sample = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A sample of 1000 images from the original ImageNet dataset",
    name=dataset_name,
)

创建数据资产。

ml_client.data.create_or_update(imagenet_sample)

若要获取新创建的数据资产，请使用以下代码：

imagenet_sample = ml_client.data.get(dataset_name, label="latest")

数据已上传且随时可用以后，就可以调用终结点了。
- Azure CLI
- Python
```
JOB_NAME = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input azureml:imagenet-sample-unlabeled@latest | jq -r '.name')
```
注意

如果未安装实用工具 jq，请参阅下载 jq。
提示

调用终结点时，使用 inputs 和 input 参数有什么差别？

一般情况下，可以将字典 inputs = {} 参数与 invoke 方法一起使用，以便向包含模型部署或管道部署的批处理终结点提供任意数量的所需输入。

对于模型部署，可以使用 input 参数来更方便地指定部署的输入数据位置。这种方法之所以可行，是因为模型部署始终只采用一个数据输入。
```
input = Input(type=AssetTypes.URI_FOLDER, path=imagenet_sample.id)
job = ml_client.batch_endpoints.invoke(
   endpoint_name=endpoint.name,
   input=input,
)
```

提示

不需在调用操作中给出部署名称。因为终结点会自动将作业路由到默认部署。由于终结点只有一个部署，因此该部署是默认部署。可通过指明自变量/参数 deployment_name 来将特定部署作为目标。

命令返回后立即启动批处理作业。在作业完成前可监视作业状态。
- Azure CLI
- Python
```
az ml job show --name $JOB_NAME
```
```
ml_client.jobs.get(job.name)
```

部署完成后，请下载预测。

Azure CLI
Python

如需下载预测，请使用以下命令：

az ml job download --name $JOB_NAME --output-name score --download-path ./

ml_client.jobs.download(name=job.name, output_name='score', download_path='./')

预测如以下输出所示。为方便读者阅读，预测已与标签合并。若要详细了解如何达到此效果，请参阅关联的笔记本。

import pandas as pd
score = pd.read_csv("named-outputs/score/predictions.csv", header=None,  names=['file', 'class', 'probabilities'], sep=' ')
score['label'] = score['class'].apply(lambda pred: imagenet_labels[pred])
score

文件	类	概率	标签
n02088094_Afghan_hound.JPEG	161	0.994745	阿富汗猎犬
n02088238_basset	162	0.999397	basset
n02088364_beagle.JPEG	165	0.366914	bluetick
n02088466_bloodhound.JPEG	164	0.926464	血狗
...	...	...	...

高吞吐量部署

如前所述，部署一次处理一张图像，即使在批量部署提供了一批图像的情况下也是如此。大多数情况下，此方法最佳。它简化了模型的运行方式，同时避免了任何可能出现的内存不足问题。但是，在其他某些情况下，你可能希望底层硬件尽可能变得饱和。例如，GPU 就是这种情况。

在这些情况下，可能需要对整批数据执行推理。该方法需要将整个图像集加载到内存中，并将其直接发送到模型。以下示例用TensorFlow读取批量图像并对全部图像一起评分。它还使用 TensorFlow 操作来执行任何数据预处理。整个管道发生在所用的同一个设备 (CPU/GPU) 上。

警告

某些模型与输入的大小在内存消耗方面存在非线性关系。若要避免出现内存不足的异常情况，请再次进行批处理（如本示例中所示）或减小批量部署创建的批大小。

创建评分脚本 code/score-by-batch/batch_driver.py：

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import load_model


def init():
    global model
    global input_width
    global input_height

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)
    input_width = 244
    input_height = 244


def decode_img(file_path):
    file = tf.io.read_file(file_path)
    img = tf.io.decode_jpeg(file, channels=3)
    img = tf.image.resize(img, [input_width, input_height])
    return img / 255.0


def run(mini_batch):
    images_ds = tf.data.Dataset.from_tensor_slices(mini_batch)
    images_ds = images_ds.map(decode_img).batch(64)

    # perform inference
    pred = model.predict(images_ds)

    # Compute probabilities, classes and labels
    pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
    pred_class = tf.math.argmax(pred, axis=-1).numpy()

    return pd.DataFrame(
        [mini_batch, pred_prob, pred_class], columns=["file", "probability", "class"]
    )

此脚本从批处理部署发送的微批构造张量数据集。该数据集经预处理后可使用map函数运算decode_img获取模型的预期张量。
数据集再次进行批处理 (16)，以便将数据发送到模型。可以使用此参数来控制加载到内存中的信息量，并一次性发送给模型。如在 GPU 上运行，则需对该参数进行调整，以在收到 OOM 异常的通知之前达到 GPU 的最大使用率。
预测计算完成后，张量将转换为numpy.ndarray。