逐步将 MLflow 模型推出到联机终结点

本文介绍如何在不导致服务中断的情况下逐步更新 MLflow 模型并将其部署到联机终结点。请使用蓝绿部署（也称为安全推出策略）将一个新版 Web 服务引入生产环境。通过此策略，可以在完全推出 Web 服务之前，将新版本的 Web 服务推广到一小部分用户或请求。

关于此示例

联机终结点具有“终结点”和“部署”的概念。终结点表示可让客户使用模型的 API，而部署表示该 API 的特定实现。这种区别使用户能够将 API 与实现分离，并在不影响使用者的情况下更改基础实现。此示例使用此类概念在终结点中更新已部署的模型，而不会引入服务中断。

部署的模型基于 UCI 心脏病数据集。数据库包含 76 个属性，但本例使用其中的 14 个。该模型尝试预测患者是否存在心脏疾病。它是从 0（不存在）到 1（存在）的整数值。它已使用 XGBoost 分类器进行训练，所有必需的预处理都打包为 scikit-learn 管道，使此模型成为从原始数据到预测的端到端管道。

本文中的信息基于 azureml-examples 存储库中包含的代码示例。若要在不复制/粘贴文件的情况下在本地运行命令，请克隆存储库，然后将目录更改为 sdk/using-mlflow/deploy。

在 Jupyter Notebook 中继续操作

可以在以下笔记本中按照此示例进行操作。在克隆的存储库中，打开笔记本：mlflow_sdk_online_endpoints_progresive.ipynb。

先决条件

在按照本文中的步骤操作之前，请确保满足以下先决条件：

一个 Azure 订阅。如果没有Azure订阅，请在开始之前创建一个试用订阅。尝试试用版订阅。
Azure基于角色的访问控制（Azure RBAC）用于授予对Azure Machine Learning中操作的访问权限。若要执行本文中的步骤，您的用户账户必须被分配为 Azure Machine Learning 工作区的所有者或参与者角色，或者具有允许访问 Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* 的自定义角色。有关详细信息，请参阅管理访问 Azure Machine Learning 工作区。

此外，还需要：

将Azure CLI和 ml 扩展安装到Azure CLI。有关详细信息，请参阅安装、设置和使用 CLI (v2)。

安装适用于 MLflow mlflow 的 Mlflow SDK 包azureml-mlflow和Azure Machine Learning插件。
```
pip install mlflow azureml-mlflow
```
如果未在 Azure Machine Learning 计算资源中运行，请将 MLflow 跟踪 URI 或 MLflow 的注册 URI 配置为指向正在处理的工作区。了解如何为 Azure Machine Learning 配置 MLflow。

连接到工作区

首先，连接到执行部署任务的Azure Machine Learning工作区。

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

工作区是Azure Machine Learning的顶级资源，提供一个集中位置，用于处理使用Azure Machine Learning时创建的所有项目。在本部分，你将连接到在其中执行部署任务的工作区。

导入所需的库：

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Model
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

配置工作区详细信息并获取对工作区的控制：

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

导入所需的库

import json
import mlflow
import requests
import pandas as pd
from mlflow.deployments import get_deploy_client

配置 MLflow 客户端和部署客户端：

mlflow_client = mlflow.MLflowClient()
deployment_client = get_deploy_client(mlflow.get_tracking_uri())

在注册表中注册模型

确保模型在Azure Machine Learning注册表中注册。 Azure Machine Learning不支持部署未注册的模型。可以使用 MLflow SDK 注册新模型：

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

创建联机终结点

联机终结点是用于联机（实时）推理的终结点。在线终结点包含已准备好接收客户端数据并可实时发回响应的部署。

可以通过在同一终结点下部署同一模型的多个版本来利用此功能。新部署在初始阶段接收 0% 的流量。确定新模型正常工作后，会逐渐将流量从一个部署移到另一个部署。

终结点需要一个名称，该名称在同一区域中必须唯一。确保创建一个尚不存在的对象。

ENDPOINT_SUFIX=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-5} | head -n 1)
ENDPOINT_NAME="heart-classifier-$ENDPOINT_SUFIX"

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

配置终结点

提示

此示例为简单起见使用基于密钥的身份验证，仅适用于开发和测试方案。对于生产部署，请使用基于令牌的 Microsoft Entra 身份验证（aad_token），该方法通过基于身份的访问控制提供更高的安全性。有关详细信息，请参阅对联机终结点的客户端进行身份验证。
endpoint.yml
```
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: heart-classifier-edp
auth_mode: key
```
```
endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="An endpoint to serve predictions of the UCI heart disease problem",
    auth_mode="key",
)
```
可以使用配置文件配置此终结点的属性。以下示例将终结点的身份验证模式配置为“密钥”：
```
endpoint_config = {
    "auth_mode": "key",
    "identity": {
        "type": "system_assigned"
    }
}
```
将此配置写入 JSON 文件：
```
endpoint_config_path = "endpoint_config.json"
with open(endpoint_config_path, "w") as outfile:
    outfile.write(json.dumps(endpoint_config))
```

创建终结点：

az ml online-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

endpoint = deployment_client.create_endpoint(
    name=endpoint_name,
    config={"endpoint-config-file": endpoint_config_path},
)

获取终结点的身份验证机密。
```
ENDPOINT_SECRET_KEY=$(az ml online-endpoint get-credentials -n $ENDPOINT_NAME -o tsv --query primaryKey)
```
```
endpoint_secret_key = ml_client.online_endpoints.get_keys(
    name=endpoint_name
).primary_key
```
此功能在 MLflow SDK 中不可用。转到 Azure Machine Learning studio，导航到终结点，并从那里检索密钥。

创建蓝色部署

到目前为止，终结点是空的。上面没有任何部署。通过部署之前注册的同一模型来创建第一个模型。此部署称为“default”，表示“蓝色部署”。

配置部署

blue-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: default
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

blue_deployment_name = "default"

配置部署的硬件要求：

blue_deployment = ManagedOnlineDeployment(
    name=blue_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

blue_deployment_name = "default"

若要配置部署的硬件要求，需要使用所需配置创建 JSON 文件：

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

注意

可在托管联机部署架构 (v2) 中找到此配置的完整规范。

将配置写入文件：

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

创建部署

az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

如果终结点没有出口连接，请通过包含标志 --with-package 来使用模型封装（预览）：

az ml online-deployment create --with-package --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

提示

create 命令中的标志 --all-traffic 会将所有流量分配给新部署。

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

blue_deployment = deployment_client.create_deployment(
    name=blue_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

将所有流量分配到部署环境

到目前为止，终结点有一个部署，但没有为其分配任何网络流量。立即指派它。
在 Azure CLI 中无需执行此步骤，因为我们在创建时使用了 --all-traffic。
```
endpoint.traffic = { blue_deployment_name: 100 }
```
```
traffic_config = {"traffic": {blue_deployment_name: 100}}
```
将配置写入文件：
```
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
```
更新终结点配置：
在 Azure CLI 中无需执行此步骤，因为我们在创建时使用了 --all-traffic。
```
ml_client.begin_create_or_update(endpoint).result()
```
```
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)
```

创建示例输入以测试部署

sample.yml

{
    "input_data": {
        "columns": [
            "age",
            "sex",
            "cp",
            "trestbps",
            "chol",
            "fbs",
            "restecg",
            "thalach",
            "exang",
            "oldpeak",
            "slope",
            "ca",
            "thal"
        ],
        "data": [
            [ 48, 0, 3, 130, 275, 0, 0, 139, 0, 0.2, 1, 0, "normal" ]
        ]
    }
}

以下代码从训练数据集中随机抽取 5 个观察值，删除 target 列（因为模型预测该列），并在 sample.json 文件中创建一个可用于模型部署的请求。

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

with open("sample.json", "w") as f:
    f.write(
        json.dumps(
            {"input_data": json.loads(samples.to_json(orient="split", index=False))}
        )
    )

以下代码从训练数据集中随机抽取 5 个观察值，删除 target 列（模型预测该列），并生成请求。

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

测试部署

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    df=samples
)

在终结点下创建绿色部署

假设开发团队创建了新版本的模型，并已准备好进行生产。可以首先测试此模型，一旦确信，请更新终结点以将流量路由到该模型。

注册新的模型版本

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

获取新模型的版本号：

VERSION=$(az ml model show -n heart-classifier --label latest | jq -r ".version")

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)
version = model.version

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

配置新部署

green-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: xgboost-model
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

将部署命名如下：

GREEN_DEPLOYMENT_NAME="xgboost-model-$VERSION"

green_deployment_name = f"xgboost-model-{version}"

配置部署的硬件要求：

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

如果终结点没有传出连接，请通过添加 with_package=True 参数使用模型打包（预览版）：

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    with_package=True,
)

green_deployment_name = f"xgboost-model-{version}"

若要配置部署的硬件要求，需要使用所需配置创建 JSON 文件：

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

提示

此示例使用在deployment-config-file中指出的相同硬件配置。但是，不需要具有相同的配置。你可以根据要求为不同的模型配置不同的硬件。

将配置写入文件：

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

创建新部署

az ml online-deployment create -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

如果终结点没有出口连接，请通过包含标志 --with-package 来使用模型封装（预览）：

az ml online-deployment create --with-package -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

ml_client.online_deployments.begin_create_or_update(green_deployment).result()

new_deployment = deployment_client.create_deployment(
    name=green_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

在不更改流量的情况下测试部署

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name $GREEN_DEPLOYMENT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=green_deployment_name
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    deployment_name=green_deployment_name, 
    df=samples
)

提示

请注意，调用的部署名称现在已被指定。

逐步更新流量

自信新部署已经就绪后，您可以更新流量设置，将部分流量路由到新部署。流量是在端点级别配置的。

配置流量：

Azure CLI

endpoint.traffic = {blue_deployment_name: 90, green_deployment_name: 10}

traffic_config = {"traffic": {blue_deployment_name: 90, green_deployment_name: 10}}

将配置写入文件：

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

更新端点

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=90 $GREEN_DEPLOYMENT_NAME=10"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

如果你决定将整个流量切换到新部署，请更新所有流量：

Azure CLI

endpoint.traffic = {blue_deployment_name: 0, green_deployment_name: 100}

traffic_config = {"traffic": {blue_deployment_name: 0, green_deployment_name: 100}}

将配置写入文件：

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

更新端点

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=0 $GREEN_DEPLOYMENT_NAME=100"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

由于旧部署未收到任何流量，因此你可以放心地删除旧部署：

az ml online-deployment delete --endpoint-name $ENDPOINT_NAME --name default

ml_client.online_deployments.begin_delete(
    name=blue_deployment_name, 
    endpoint_name=endpoint_name
)

deployment_client.delete_deployment(
    blue_deployment_name, 
    endpoint=endpoint_name
)

提示

此时，已删除以前的“蓝色部署”，新的“绿色部署”取代了“蓝色部署”。

清理资源

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

ml_client.online_endpoints.begin_delete(name=endpoint_name)

deployment_client.delete_endpoint(endpoint_name)

重要

删除终结点也会删除其下的所有部署。

后续步骤

Last updated on 2026-04-22

逐步将 MLflow 模型推出到联机终结点

关于此示例

在 Jupyter Notebook 中继续操作

先决条件

连接到工作区

在注册表中注册模型

创建联机终结点

创建蓝色部署

在终结点下创建绿色部署

逐步更新流量

清理资源

后续步骤

추가 리소스