使用批量部署的模型进行图像处理

2025/04/12

适用范围：Azure CLI ml 扩展 v2（最新版）Python SDK azure-ai-ml v2（最新版）

批量模型部署可用于处理表格数据，也可用于处理图像等任何其他文件类型。 MLflow 和自定义模型都支持这些部署。本文介绍如何根据 ImageNet 分类部署图像分类模型。

先决条件

一个 Azure 订阅。如果没有 Azure 订阅，可在开始前创建一个试用帐户。
Azure 机器学习工作区。若要创建工作区，请参阅管理 Azure 机器学习工作区。
Azure 机器学习工作区中的以下权限：
- 对于创建或管理批处理终结点和部署：使用已分配有 Microsoft.MachineLearningServices/workspaces/batchEndpoints/* 权限的“所有者”角色、“参与者”角色或自定义角色。
- 对于在工作区资源组中创建 Azure 资源管理器部署：使用在部署了工作区的资源组中已分配有 Microsoft.Resources/deployments/write 权限的“所有者”角色、“参与者”角色或自定义角色。
Azure 机器学习 CLI 或适用于 Python 的 Azure 机器学习 SDK：
- Azure CLI
- Python
运行以下命令，以安装 Azure CLI 和 Azure 机器学习的 ml 扩展：
```
az extension add -n ml
```
Azure CLI 的 ml 扩展版本 2.7 中引入了批处理终结点的管道组件部署。使用 az extension update --name ml 命令获取最新版本。
运行以下命令安装适用于 Python 的 Azure 机器学习 SDK：
```
pip install azure-ai-ml
```
该 SDK 的 1.7.0 版本中引入了 ModelBatchDeployment 和 PipelineComponentBatchDeployment 类。使用 pip install -U azure-ai-ml 命令获取最新版本。

连接到工作区

工作区是 Azure 机器学习的顶级资源。它提供了一个集中的位置，用于处理你在使用 Azure 机器学习时创建的所有项目。在本部分，你将连接到要在其中执行部署任务的工作区。

Azure CLI
Python

在以下命令中，输入你的订阅 ID、工作区名称、资源组名称以及位置：

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

导入所需的库：

from azure.ai.ml import MLClient, Input, load_component
from azure.ai.ml.entities import BatchEndpoint, ModelBatchDeployment, ModelBatchDeploymentSettings, PipelineComponentBatchDeployment, Model, AmlCompute, Data, BatchRetrySettings, CodeConfiguration, Environment, Data
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.ai.ml.dsl import pipeline
from azure.identity import DefaultAzureCredential

配置工作区详细信息并获取工作区的句柄：

在以下命令中，输入你的订阅 ID、资源组名称和工作区名称：

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

关于此示例

本文使用的模型使用 TensorFlow 和 RestNet 体系结构构建。有关详细信息，请参阅深度残差网络中的标识映射。可以下载此模型的示例。该模型有以下约束：

它适用于大小为 244x244（张量为 (224, 224, 3)）的图像。
要求将输入缩放到范围 [0,1] 内。

本文中的信息基于 azureml-examples 存储库中包含的代码示例。若要在不复制/粘贴 YAML 和其他文件的情况下在本地运行命令，请克隆存储库。如果使用的是 Azure CLI，请将目录更改为 cli/endpoints/batch/deploy-models/imagenet-classifier；如果使用的是 SDK for Python，请将目录更改为 sdk/python/endpoints/batch/deploy-models/imagenet-classifier。

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli/endpoints/batch/deploy-models/imagenet-classifier

在 Jupyter Notebooks 中继续操作

可以在 Jupyter Notebook 中按照此示例进行操作。在克隆的存储库中，打开以下笔记本：imagenet-classifier-batch.ipynb。

利用批部署进行图像分类

此示例介绍如何部署可根据 ImageNet 分类对给定图像进行分类的深度学习模型。

创建终结点

创建托管模型的终结点：

Azure CLI
Python

指定终结点的名称。

ENDPOINT_NAME="imagenet-classifier-batch"

创建名为 endpoint.yml 的以下 YAML 文件来定义批处理终结点：

$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: imagenet-classifier-batch
description: A batch endpoint for performing image classification using a TFHub model ImageNet model.
auth_mode: aad_token

若要创建终结点，请运行以下代码：

az ml batch-endpoint create --file endpoint.yml  --name $ENDPOINT_NAME

指定终结点的名称。

endpoint_name="imagenet-classifier-batch"

配置终结点。

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An batch service to perform ImageNet image classification",
)

若要创建终结点，请运行以下代码：

ml_client.batch_endpoints.begin_create_or_update(endpoint)

注册模型

模型部署只能部署已注册的模型。需要注册模型。如果打算部署的模型已注册，则可以跳过此步骤。

下载模型副本。

Azure CLI
Python

wget https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/model.zip
mkdir -p imagenet-classifier
unzip model.zip -d imagenet-classifier

import os
import urllib.request
from zipfile import ZipFile

response = urllib.request.urlretrieve('https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/model.zip', 'model.zip')

os.mkdirs("imagenet-classifier", exits_ok=True)
with ZipFile(response[0], 'r') as zip:
  model_path = zip.extractall(path="imagenet-classifier")

注册模型。

Azure CLI
Python

MODEL_NAME='imagenet-classifier'
az ml model create --name $MODEL_NAME --path "model"

model_name = 'imagenet-classifier'
model = ml_client.models.create_or_update(
    Model(name=model_name, path=model_path, type=AssetTypes.CUSTOM_MODEL)
)

创建评分脚本

创建评分脚本，该脚本能够读取批量部署提供的图像并返回模型的分数。

init 方法使用 keras 中的 tensorflow 模块来加载模型。
run 方法针对批处理部署提供的每个微批运行。
run 方法每次读取文件的一张图像。
该run方法可将图像大小调整为模型的预期大小。
该run方法将图像重新缩放到范围[0,1]域，即模型的预期内容。
脚本会返回与预测有关的类和概率。

此代码是 code/score-by-file/batch_driver.py 文件：

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from os.path import basename
from PIL import Image
from tensorflow.keras.models import load_model


def init():
    global model
    global input_width
    global input_height

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)
    input_width = 244
    input_height = 244


def run(mini_batch):
    results = []

    for image in mini_batch:
        data = Image.open(image).resize(
            (input_width, input_height)
        )  # Read and resize the image
        data = np.array(data) / 255.0  # Normalize
        data_batch = tf.expand_dims(
            data, axis=0
        )  # create a batch of size (1, 244, 244, 3)

        # perform inference
        pred = model.predict(data_batch)

        # Compute probabilities, classes and labels
        pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
        pred_class = tf.math.argmax(pred, axis=-1).numpy()

        results.append([basename(image), pred_class[0], pred_prob])

    return pd.DataFrame(results)

提示

尽管图像是通过部署按小型批次提供的，但此评分程序一次仅处理一个图像。这是一种常见模式，尝试加载整个批处理并将其一次性发送到模型可能会导致批处理执行程序（OOM 异常）的内存压力增大。

在某些情况下，这样做可以实现评分任务的高吞吐量。通过 GPU 硬件进行批量部署就是这种情况，即需要实现高 GPU 利用率。有关利用此方法的评分脚本，请参阅高吞吐量部署。

备注

若要部署一个会生成文件的生成式模型，请了解如何创作评分脚本：在批量部署中自定义输出。

创建部署

创建评分脚本后，请为其创建批量部署。请按以下过程操作：

确保创建了一个可在其中创建部署的计算群集。本示例使用名为 gpu-cluster 的计算群集。使用 GPU 可加快处理速度，尽管这不是必需的。

指示在哪个环境中运行部署。在此示例中，模型在 TensorFlow 上运行。 Azure 机器学习已具备已安装所需软件的环境，因此你可以重复使用此环境。需要在 conda.yml 文件中添加几个依赖项。

Azure CLI
Python

环境定义将包含在部署文件中。

compute: azureml:gpu-cluster
environment:
  name: tensorflow27-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest

获取对环境的引用。

environment = Environment(
    name="tensorflow27-cuda11-gpu",
    conda_file="environment/conda.yml",
    image="mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest",
)

创建部署。

Azure CLI
Python

如需在创建的终结点下创建新部署，请创建如以下示例所示的 YAML 配置。对于其他属性，请参阅完整批处理终结点 YAML 架构。

$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
endpoint_name: imagenet-classifier-batch
name: imagenet-classifier-resnetv2
description: A ResNetV2 model architecture for performing ImageNet classification in batch
type: model
model: azureml:imagenet-classifier@latest
compute: azureml:gpu-cluster
environment:
  name: tensorflow27-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest
  conda_file: environment/conda.yaml
code_configuration:
  code: code/score-by-file
  scoring_script: batch_driver.py
resources:
  instance_count: 2
settings:
  max_concurrency_per_instance: 1
  mini_batch_size: 5
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 300
 error_threshold: -1
 logging_level: info

使用以下命令创建部署：

DEPLOYMENT_NAME="imagenet-classifier-resnetv2"
az ml batch-deployment create -f deployment.yml

如需用所示环境和评分脚本创建新部署，请使用以下代码：

deployment = BatchDeployment(
    name="imagenet-classifier-resnetv2",
    description="A ResNetV2 model architecture for performing ImageNet classification in batch",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="code/score-by-file",
        scoring_script="batch_driver.py",
    ),
    compute=compute_name,
    instance_count=2,
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)

使用以下命令创建部署：

ml_client.batch_deployments.begin_create_or_update(deployment)

尽管可以在终结点内部调用特定部署，但通常需要调用终结点本身，并支持终结点决定使用哪个部署。此类部署称为默认部署。

这种方法允许更改默认部署，从而更改处理部署的模型，而无需更改与调用终结点的用户之间的协定。使用以下代码更新默认部署：
- Azure 机器学习 CLI
- 适用于 Python 的 Azure 机器学习 SDK
```
az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME
```
```
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint)
```

批处理终结点随时可用。

测试部署

为测试终结点，请使用原始 ImageNet 数据集中的 1,000 个图像示例。批处理终结点只能处理云中可从 Azure 机器学习工作区访问的数据。将其上传到 Azure 机器学习数据存储中。创建可用于调用终结点进行评分的数据资产。

备注

批处理终结点接受可存放在多种类型的位置的数据。

下载关联的示例数据。

Azure CLI
Python

wget https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet/imagenet-1000.zip
unzip imagenet-1000.zip -d data

备注

如果没有在本地安装 wget，请安装它或使用浏览器获取 .zip 文件。

!wget https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet-1000.zip
!unzip imagenet-1000.zip -d data

用下载的数据创建数据资产。

Azure CLI
Python

在名为 YAML 的文件中创建数据资产定义：

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: imagenet-sample-unlabeled
description: A sample of 1000 images from the original ImageNet dataset. Download content from https://azuremlexampledata.blob.core.chinacloudapi.cn/data/imagenet-1000.zip.
type: uri_folder
path: data

创建数据资产。

az ml data create -f imagenet-sample-unlabeled.yml

指定以下值：

data_path = "data"
dataset_name = "imagenet-sample-unlabeled"

imagenet_sample = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A sample of 1000 images from the original ImageNet dataset",
    name=dataset_name,
)

创建数据资产。

ml_client.data.create_or_update(imagenet_sample)

若要获取新创建的数据资产，请使用以下代码：

imagenet_sample = ml_client.data.get(dataset_name, label="latest")

数据已上传且随时可用以后，就可以调用终结点了。
- Azure CLI
- Python
```
JOB_NAME = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input azureml:imagenet-sample-unlabeled@latest | jq -r '.name')
```
备注

如果未安装实用工具 jq，请参阅下载 jq。
提示

调用终结点时，使用 inputs 和 input 参数有什么差别？

一般情况下，可以将字典 inputs = {} 参数与 invoke 方法一起使用，以便向包含模型部署或管道部署的批处理终结点提供任意数量的所需输入。

对于模型部署，可以使用 input 参数来更方便地指定部署的输入数据位置。这种方法之所以可行，是因为模型部署始终只采用一个数据输入。
```
input = Input(type=AssetTypes.URI_FOLDER, path=imagenet_sample.id)
job = ml_client.batch_endpoints.invoke(
   endpoint_name=endpoint.name,
   input=input,
)
```

提示

不需在调用操作中给出部署名称。因为终结点会自动将作业路由到默认部署。由于终结点只有一个部署，因此该部署是默认部署。可通过指明自变量/参数 deployment_name 来将特定部署作为目标。

命令返回后立即启动批处理作业。在作业完成前可监视作业状态。
- Azure CLI
- Python
```
az ml job show --name $JOB_NAME
```
```
ml_client.jobs.get(job.name)
```

部署完成后，请下载预测。

Azure CLI
Python

如需下载预测，请使用以下命令：

az ml job download --name $JOB_NAME --output-name score --download-path ./

ml_client.jobs.download(name=job.name, output_name='score', download_path='./')

预测如以下输出所示。为方便读者阅读，预测已与标签合并。若要详细了解如何达到此效果，请参阅关联的笔记本。

import pandas as pd
score = pd.read_csv("named-outputs/score/predictions.csv", header=None,  names=['file', 'class', 'probabilities'], sep=' ')
score['label'] = score['class'].apply(lambda pred: imagenet_labels[pred])
score

file	class	概率	label
n02088094_Afghan_hound.JPEG	161	0.994745	阿富汗猎犬
n02088238_basset	162	0.999397	basset
n02088364_beagle.JPEG	165	0.366914	bluetick
n02088466_bloodhound.JPEG	164	0.926464	血狗
...	...	...	...

高吞吐量部署

如前所述，部署一次处理一张图像，即使在批量部署提供了一批图像的情况下也是如此。大多数情况下，此方法最佳。它简化了模型的运行方式，同时避免了任何可能出现的内存不足问题。但是，在其他某些情况下，你可能希望底层硬件尽可能变得饱和。例如，GPU 就是这种情况。

在这些情况下，可能需要对整批数据执行推理。该方法需要将整个图像集加载到内存中，并将其直接发送到模型。以下示例用TensorFlow读取批量图像并对全部图像一起评分。它还使用 TensorFlow 操作来执行任何数据预处理。整个管道发生在所用的同一个设备 (CPU/GPU) 上。

警告

某些模型与输入的大小在内存消耗方面存在非线性关系。若要避免出现内存不足的异常情况，请再次进行批处理（如本示例中所示）或减小批量部署创建的批大小。

创建评分脚本 code/score-by-batch/batch_driver.py：

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import load_model


def init():
    global model
    global input_width
    global input_height

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)
    input_width = 244
    input_height = 244


def decode_img(file_path):
    file = tf.io.read_file(file_path)
    img = tf.io.decode_jpeg(file, channels=3)
    img = tf.image.resize(img, [input_width, input_height])
    return img / 255.0


def run(mini_batch):
    images_ds = tf.data.Dataset.from_tensor_slices(mini_batch)
    images_ds = images_ds.map(decode_img).batch(64)

    # perform inference
    pred = model.predict(images_ds)

    # Compute probabilities, classes and labels
    pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
    pred_class = tf.math.argmax(pred, axis=-1).numpy()

    return pd.DataFrame(
        [mini_batch, pred_prob, pred_class], columns=["file", "probability", "class"]
    )

此脚本从批处理部署发送的微批构造张量数据集。该数据集经预处理后可使用map函数运算decode_img获取模型的预期张量。
数据集再次进行批处理 (16)，以便将数据发送到模型。可以使用此参数来控制加载到内存中的信息量，并一次性发送给模型。如在 GPU 上运行，则需对该参数进行调整，以在收到 OOM 异常的通知之前达到 GPU 的最大使用率。
预测计算完成后，张量将转换为numpy.ndarray。

创建部署。

Azure CLI
Python

如需在创建的终结点下创建新部署，请创建如以下示例所示的 YAML 配置。对于其他属性，请参阅完整批处理终结点 YAML 架构。

$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
endpoint_name: imagenet-classifier-batch
name: imagenet-classifier-resnetv2
description: A ResNetV2 model architecture for performing ImageNet classification in batch
type: model
model: azureml:imagenet-classifier@latest
compute: azureml:gpu-cluster
environment:
  name: tensorflow27-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest
  conda_file: environment/conda.yaml
code_configuration:
  code: code/score-by-batch
  scoring_script: batch_driver.py
resources:
  instance_count: 2
tags:
  device_acceleration: CUDA
  device_batching: 16
settings:
  max_concurrency_per_instance: 1
  mini_batch_size: 5
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 300
  error_threshold: -1
  logging_level: info

使用以下命令创建部署：

az ml batch-deployment create --file deployment-by-batch.yml --endpoint-name $ENDPOINT_NAME --default

如需用所示环境和评分脚本创建新部署，请使用以下代码：

deployment = BatchDeployment(
    name="imagenet-classifier-resnetv2",
    description="A ResNetV2 model architecture for performing ImageNet classification in batch",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="code/score-by-batch",
        scoring_script="batch_driver.py",
    ),
    compute=compute_name,
    instance_count=2,
    tags={ "device_acceleration": "CUDA", "device_batching": "16" }
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)

使用以下命令创建部署：

ml_client.batch_deployments.begin_create_or_update(deployment)

可以将此新部署与前面显示的示例数据一起使用。请记住，若要调用此部署，需在调用方法中指明部署的名称，或将其设置为默认部署。

MLflow 模型处理图像时的注意事项

批处理终结点中的 MLflow 模型支持将图像读取为输入数据。由于 MLflow 部署不需要评分脚本，因此使用它们时需要注意以下事项：

支持的图像文件包括：.png、.jpg、.jpeg、.tiff、.bmp 和 .gif。
MLflow 模型应该预期接收到与输入图像尺寸相匹配的 np.ndarray 作为输入。为确保每次批处理均支持多种图像大小，批处理执行程序会针对每个图像文件调用一次 MLflow 模型。
强烈建议 MLflow 模型包含签名。如果是这样，那它必须是 TensorSpec 类型。在可能的情况下，会重新塑造输入以匹配张量的形状。如果没有可用的签名，则会推理 np.uint8 类型的张量。
对于包含签名且能处理不同大小的图像的模型，应包含可以提供保证的签名。例如，以下签名示例允许对 3 通道图像进行批处理。

import numpy as np
import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec

input_schema = Schema([
  TensorSpec(np.dtype(np.uint8), (-1, -1, -1, 3)),
])
signature = ModelSignature(inputs=input_schema)

(...)

mlflow.<flavor>.log_model(..., signature=signature)

可以在 Jupyter 笔记本 imagenet-classifier-mlflow.ipynb 中找到一个工作示例。若要详细了解如何在批量部署中使用 MLflow 模型，请参阅在批量部署中使用 MLflow 模型。

Microsoft Ignite

通过

先决条件

连接到工作区

关于此示例

在 Jupyter Notebooks 中继续操作

利用批部署进行图像分类

创建终结点

注册模型

创建评分脚本

创建部署

测试部署

高吞吐量部署

MLflow 模型处理图像时的注意事项

后续步骤

通过

使用批量部署的模型进行图像处理

先决条件

连接到工作区

关于此示例

在 Jupyter Notebooks 中继续操作

利用批部署进行图像分类

创建终结点

注册模型

创建评分脚本

创建部署

测试部署

高吞吐量部署

MLflow 模型处理图像时的注意事项

后续步骤

其他资源