Azure 机器学习中的自动缩放联机终结点

项目
2024/10/15

适用范围：Azure CLI ml 扩展 v2（最新版）Python SDK azure-ai-ml v2（最新版）

本文介绍如何根据指标和计划配置自动缩放来管理部署中的资源使用情况。自动缩放过程可以自动运行正确的资源量来处理应用程序上的负载。 Azure 机器学习中的联机终结点通过与 Azure Monitor 中的自动缩放功能集成来支持自动缩放。

Azure Monitor 自动缩放允许设置规则，以在满足规则条件时触发一个或多个自动缩放操作。可以配置基于指标的缩放（例如 CPU 利用率大于 70%）、基于计划的缩放（例如高峰时段的缩放规则），或两者的组合。有关详细信息，请参阅 Azure 中的自动缩放概述。

显示自动缩放如何根据需要添加和删除实例的关系图。

目前可以使用 Azure CLI、REST API、Azure 资源管理器、Python SDK 或基于浏览器的 Microsoft Azure 门户来管理自动缩放。

先决条件

一个已部署的终结点。有关详细信息，请参阅使用在线终结点部署机器学习模型并为其评分。
若要使用自动缩放，必须将角色 microsoft.insights/autoscalesettings/write 分配给管理自动缩放的标识。你可以使用允许此操作的任何内置或自定义角色。有关管理 Azure 机器学习角色的一般指南，请参阅管理用户和角色。有关 Azure Monitor 自动缩放设置的详细信息，请参阅 Microsoft.Insights autoscalesettings。
若要使用 Python SDK 管理 Azure Monitor 服务，请使用以下命令安装 azure-mgmt-monitor 包：
```
pip install azure-mgmt-monitor
```

定义自动缩放配置文件

若要为联机终结点启用自动缩放，首先要定义自动缩放配置文件。此配置文件指定默认、最小和最大规模集容量。以下示例演示如何设置默认、最小和最大缩放容量的虚拟机 (VM) 实例数。

适用于：Azure CLI ml 扩展 v2（当前）

设置终结点和部署名称：

# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue

获取部署和终结点的 Azure 资源管理器 ID：

# Note: this is based on the deploy-managed-online-endpoint.sh: it just adds autoscale settings in the end
set -e

#set the endpoint name from the how-to-deploy excercise
# <set_endpoint_deployment_name>
# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue
# </set_endpoint_deployment_name>

export ENDPOINT_NAME=autoscale-endpt-`echo $RANDOM`

# create endpoint and deployment
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
az ml online-endpoint show -n $ENDPOINT_NAME

# <set_other_env_variables>
# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`
# </set_other_env_variables>

# create autoscale settings. Note if you followed the how-to-deploy doc example, the instance count would have been 1. Now after applying this poilcy, it will scale up 2 (since min count and count are 2).
# <create_autoscale_profile>
az monitor autoscale create \
  --name $AUTOSCALE_SETTINGS_NAME \
  --resource $DEPLOYMENT_RESOURCE_ID \
  --min-count 2 --max-count 5 --count 2
# </create_autoscale_profile>

# Add rule to default profile: scale up if cpu util > 70 %
# <scale_out_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage > 70 avg 5m" \
  --scale out 2
# </scale_out_on_cpu_util>

# Add rule to default profile: scale down if cpu util < 25 %
# <scale_in_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1
# </scale_in_on_cpu_util>

# add rule to default profile: scale up based on avg. request latency (endpoint metric)
# <scale_up_on_request_latency>
az monitor autoscale rule create \
 --autoscale-name $AUTOSCALE_SETTINGS_NAME \
 --condition "RequestLatency > 70 avg 5m" \
 --scale out 1 \
 --resource $ENDPOINT_RESOURCE_ID
# </scale_up_on_request_latency>

# create weekend profile: scale to 2 nodes in weekend
# <weekend_profile>
az monitor autoscale profile create \
  --name weekend-profile \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --min-count 2 --count 2 --max-count 2 \
  --recurrence week sat sun --timezone "Pacific Standard Time" 
# </weekend_profile>

# disable the autoscale profile
# <disable_profile>
az monitor autoscale update \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --enabled false
# </disable_profile>

# <delete_endpoint>
# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"

# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
# </delete_endpoint>

创建自动缩放配置文件：

az monitor autoscale create --name $AUTOSCALE_SETTINGS_NAME --resource $DEPLOYMENT_RESOURCE_ID --min-count 2 --max-count 5 --count 2

备注

有关详细信息，请参阅 az 显示器自动缩放参考。

适用范围：Python SDK azure-ai-ml v2（最新版）

导入必要的模块：

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.mgmt.monitor import MonitorManagementClient
from azure.mgmt.monitor.models import AutoscaleProfile, ScaleRule, MetricTrigger, ScaleAction, Recurrence, RecurrentSchedule
import random 
import datetime

为工作区、终结点和部署定义变量：

subscription_id = "<YOUR-SUBSCRIPTION-ID>"
resource_group = "<YOUR-RESOURCE-GROUP>"
workspace = "<YOUR-WORKSPACE>"

endpoint_name = "<YOUR-ENDPOINT-NAME>"
deployment_name = "blue"

获取 Azure 机器学习和 Azure Monitor 客户端：

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential, subscription_id, resource_group, workspace
)

mon_client = MonitorManagementClient(
    credential, subscription_id
)

获取终结点和部署对象：

deployment = ml_client.online_deployments.get(
    deployment_name, endpoint_name
)

endpoint = ml_client.online_endpoints.get(
    endpoint_name
)

创建自动缩放配置文件：

# Set a unique name for autoscale settings for this deployment. The following code appends a random number to create a unique name.
autoscale_settings_name = f"autoscale-{endpoint_name}-{deployment_name}-{random.randint(0,1000)}"

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = []
            )
        ]
    }
)

基于部署指标创建横向扩展规则

常见的横向扩展规则是在平均 CPU 负载较高时增加 VM 实例数。以下示例演示如何在 CPU 平均负载超过 70% 5 分钟时分配两个节点（最大节点数）：

适用于：Azure CLI ml 扩展 v2（当前）

# Note: this is based on the deploy-managed-online-endpoint.sh: it just adds autoscale settings in the end
set -e

#set the endpoint name from the how-to-deploy excercise
# <set_endpoint_deployment_name>
# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue
# </set_endpoint_deployment_name>

export ENDPOINT_NAME=autoscale-endpt-`echo $RANDOM`

# create endpoint and deployment
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
az ml online-endpoint show -n $ENDPOINT_NAME

# <set_other_env_variables>
# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`
# </set_other_env_variables>

# create autoscale settings. Note if you followed the how-to-deploy doc example, the instance count would have been 1. Now after applying this poilcy, it will scale up 2 (since min count and count are 2).
# <create_autoscale_profile>
az monitor autoscale create \
  --name $AUTOSCALE_SETTINGS_NAME \
  --resource $DEPLOYMENT_RESOURCE_ID \
  --min-count 2 --max-count 5 --count 2
# </create_autoscale_profile>

# Add rule to default profile: scale up if cpu util > 70 %
# <scale_out_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage > 70 avg 5m" \
  --scale out 2
# </scale_out_on_cpu_util>

# Add rule to default profile: scale down if cpu util < 25 %
# <scale_in_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1
# </scale_in_on_cpu_util>

# add rule to default profile: scale up based on avg. request latency (endpoint metric)
# <scale_up_on_request_latency>
az monitor autoscale rule create \
 --autoscale-name $AUTOSCALE_SETTINGS_NAME \
 --condition "RequestLatency > 70 avg 5m" \
 --scale out 1 \
 --resource $ENDPOINT_RESOURCE_ID
# </scale_up_on_request_latency>

# create weekend profile: scale to 2 nodes in weekend
# <weekend_profile>
az monitor autoscale profile create \
  --name weekend-profile \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --min-count 2 --count 2 --max-count 2 \
  --recurrence week sat sun --timezone "Pacific Standard Time" 
# </weekend_profile>

# disable the autoscale profile
# <disable_profile>
az monitor autoscale update \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --enabled false
# </disable_profile>

# <delete_endpoint>
# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"

# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
# </delete_endpoint>

该规则是 my-scale-settings 配置文件的一部分，其中 autoscale-name 与配置文件的 name 部分匹配。规则 condition 参数的值指示规则在“VM 实例的平均 CPU 消耗量持续 5 分钟超过 70%”时触发。满足条件后，再分配两个 VM 实例。

备注

有关详细信息，请参阅 az 显示器自动缩放 Azure CLI 语法参考。

适用范围：Python SDK azure-ai-ml v2（最新版）

创建规则定义：

rule_scale_out = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="CpuUtilizationPercentage",
        metric_resource_uri = deployment.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "GreaterThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 70
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 2, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

此规则是指参数 metric_name、time_window 和 time_aggregation 中 CPUUtilizationpercentage 值的最近 5 分钟平均值。当指标的值大于 70 的 threshold 时，部署将再分配两个 VM 实例。

更新 my-scale-settings 配置文件以包含此规则：

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out
                ]
            )
        ]
    }
)

基于部署指标创建横向缩减规则

当平均 CPU 负载较轻时，缩减规则可以减少 VM 实例数。以下示例演示如何将单个节点释放到至少两个节点（如果 CPU 负载持续 5 分钟小于 30%）。

适用于：Azure CLI ml 扩展 v2（当前）

az monitor autoscale rule create --autoscale-name $AUTOSCALE_SETTINGS_NAME --condition "CpuUtilizationPercentage < 25 avg 5m" --scale in 1

适用范围：Python SDK azure-ai-ml v2（最新版）

创建规则定义：

rule_scale_in = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="CpuUtilizationPercentage",
        metric_resource_uri = deployment.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "LessThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 30
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 1, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

更新 my-scale-settings 配置文件以包含此规则：

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out, 
                    rule_scale_in
                ]
            )
        ]
    }
)

基于终结点指标创建缩放规则

在前面的部分中，你创建了根据部署指标进行横向扩展或横向扩展的规则。还可以创建适用于部署终结点的规则。在本部分中，你将了解如何在请求延迟持续 5 分钟大于平均值 70 毫秒时分配另一个节点。

适用于：Azure CLI ml 扩展 v2（当前）

az monitor autoscale rule create --autoscale-name $AUTOSCALE_SETTINGS_NAME --condition "RequestLatency > 70 avg 5m" --scale out 1 --resource $ENDPOINT_RESOURCE_ID

适用范围：Python SDK azure-ai-ml v2（最新版）

创建规则定义：

rule_scale_out_endpoint = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="RequestLatency",
        metric_resource_uri = endpoint.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "GreaterThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 70
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 1, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

该规则的 metric_resource_uri 字段现在指向的是终结点，而不是部署。

更新 my-scale-settings 配置文件以包含此规则：

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out, 
                    rule_scale_in,
                    rule_scale_out_endpoint
                ]
            )
        ]
    }
)

查找支持的指标的 ID

如果要使用代码中的其他指标通过 Azure CLI 或 SDK 设置自动缩放规则，请参阅“可用指标”中的表格。

基于计划创建缩放规则

还可以创建仅在特定日期或特定时间应用的规则。在本部分中，你将创建一个规则，用于将周末的节点计数设置为 2。

适用于：Azure CLI ml 扩展 v2（当前）

az monitor autoscale profile create --name weekend-profile --autoscale-name $AUTOSCALE_SETTINGS_NAME --min-count 2 --count 2 --max-count 2 --recurrence week sat sun --timezone "Pacific Standard Time"

适用范围：Python SDK azure-ai-ml v2（最新版）

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="Default",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 2,
                    "default" : 2
                },
                recurrence = Recurrence(
                    frequency = "Week", 
                    schedule = RecurrentSchedule(
                        time_zone = "Pacific Standard Time", 
                        days = ["Saturday", "Sunday"], 
                        hours = [], 
                        minutes = []
                    )
                )
            )
        ]
    }
)

启用或禁用自动缩放

可以启用或禁用特定的自动缩放配置文件。

适用于：Azure CLI ml 扩展 v2（当前）

# Note: this is based on the deploy-managed-online-endpoint.sh: it just adds autoscale settings in the end
set -e

#set the endpoint name from the how-to-deploy excercise
# <set_endpoint_deployment_name>
# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue
# </set_endpoint_deployment_name>

export ENDPOINT_NAME=autoscale-endpt-`echo $RANDOM`

# create endpoint and deployment
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
az ml online-endpoint show -n $ENDPOINT_NAME

# <set_other_env_variables>
# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`
# </set_other_env_variables>

# create autoscale settings. Note if you followed the how-to-deploy doc example, the instance count would have been 1. Now after applying this poilcy, it will scale up 2 (since min count and count are 2).
# <create_autoscale_profile>
az monitor autoscale create \
  --name $AUTOSCALE_SETTINGS_NAME \
  --resource $DEPLOYMENT_RESOURCE_ID \
  --min-count 2 --max-count 5 --count 2
# </create_autoscale_profile>

# Add rule to default profile: scale up if cpu util > 70 %
# <scale_out_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage > 70 avg 5m" \
  --scale out 2
# </scale_out_on_cpu_util>

# Add rule to default profile: scale down if cpu util < 25 %
# <scale_in_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1
# </scale_in_on_cpu_util>

# add rule to default profile: scale up based on avg. request latency (endpoint metric)
# <scale_up_on_request_latency>
az monitor autoscale rule create \
 --autoscale-name $AUTOSCALE_SETTINGS_NAME \
 --condition "RequestLatency > 70 avg 5m" \
 --scale out 1 \
 --resource $ENDPOINT_RESOURCE_ID
# </scale_up_on_request_latency>

# create weekend profile: scale to 2 nodes in weekend
# <weekend_profile>
az monitor autoscale profile create \
  --name weekend-profile \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --min-count 2 --count 2 --max-count 2 \
  --recurrence week sat sun --timezone "Pacific Standard Time" 
# </weekend_profile>

# disable the autoscale profile
# <disable_profile>
az monitor autoscale update \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --enabled false
# </disable_profile>

# <delete_endpoint>
# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"

# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
# </delete_endpoint>

适用范围：Python SDK azure-ai-ml v2（最新版）

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "enabled" : False
    }
)

删除资源

如果不打算使用部署，请按照以下步骤删除资源。

适用于：Azure CLI ml 扩展 v2（当前）

# Note: this is based on the deploy-managed-online-endpoint.sh: it just adds autoscale settings in the end
set -e

#set the endpoint name from the how-to-deploy excercise
# <set_endpoint_deployment_name>
# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue
# </set_endpoint_deployment_name>

export ENDPOINT_NAME=autoscale-endpt-`echo $RANDOM`

# create endpoint and deployment
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
az ml online-endpoint show -n $ENDPOINT_NAME

# <set_other_env_variables>
# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`
# </set_other_env_variables>

# create autoscale settings. Note if you followed the how-to-deploy doc example, the instance count would have been 1. Now after applying this poilcy, it will scale up 2 (since min count and count are 2).
# <create_autoscale_profile>
az monitor autoscale create \
  --name $AUTOSCALE_SETTINGS_NAME \
  --resource $DEPLOYMENT_RESOURCE_ID \
  --min-count 2 --max-count 5 --count 2
# </create_autoscale_profile>

# Add rule to default profile: scale up if cpu util > 70 %
# <scale_out_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage > 70 avg 5m" \
  --scale out 2
# </scale_out_on_cpu_util>

# Add rule to default profile: scale down if cpu util < 25 %
# <scale_in_on_cpu_util>
az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1
# </scale_in_on_cpu_util>

# add rule to default profile: scale up based on avg. request latency (endpoint metric)
# <scale_up_on_request_latency>
az monitor autoscale rule create \
 --autoscale-name $AUTOSCALE_SETTINGS_NAME \
 --condition "RequestLatency > 70 avg 5m" \
 --scale out 1 \
 --resource $ENDPOINT_RESOURCE_ID
# </scale_up_on_request_latency>

# create weekend profile: scale to 2 nodes in weekend
# <weekend_profile>
az monitor autoscale profile create \
  --name weekend-profile \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --min-count 2 --count 2 --max-count 2 \
  --recurrence week sat sun --timezone "Pacific Standard Time" 
# </weekend_profile>

# disable the autoscale profile
# <disable_profile>
az monitor autoscale update \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --enabled false
# </disable_profile>

# <delete_endpoint>
# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"

# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
# </delete_endpoint>

适用范围：Python SDK azure-ai-ml v2（最新版）

mon_client.autoscale_settings.delete(
    resource_group, 
    autoscale_settings_name
)

ml_client.online_endpoints.begin_delete(endpoint_name)

AI 技能盛会

通过

Azure 机器学习中的自动缩放联机终结点

先决条件

定义自动缩放配置文件

基于部署指标创建横向扩展规则

基于部署指标创建横向缩减规则

基于终结点指标创建缩放规则

查找支持的指标的 ID

基于计划创建缩放规则

启用或禁用自动缩放

删除资源

其他资源

AI 技能盛会

通过

Azure 机器学习中的自动缩放联机终结点

先决条件

定义自动缩放配置文件

基于部署指标创建横向扩展规则

基于部署指标创建横向缩减规则

基于终结点指标创建缩放规则

查找支持的指标的 ID

基于计划创建缩放规则

启用或禁用自动缩放

删除资源

相关内容

其他资源