将模型部署到 Azure Kubernetes 服务群集Deploy a model to an Azure Kubernetes Service cluster

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

了解如何使用 Azure 机器学习将模型部署为 Azure Kubernetes 服务 (AKS) 中的 Web 服务。Learn how to use Azure Machine Learning to deploy a model as a web service on Azure Kubernetes Service (AKS). Azure Kubernetes 服务适用于大规模的生产部署。Azure Kubernetes Service is good for high-scale production deployments. 如果需要以下一项或多项功能,请使用 Azure Kubernetes 服务:Use Azure Kubernetes service if you need one or more of the following capabilities:

  • 快速响应时间____。Fast response time.
  • 自动缩放已部署的服务____。Autoscaling of the deployed service.
  • 硬件加速选项,如 GPU 和现场可编程门阵列 (FPGA)____。Hardware acceleration options such as GPU and field-programmable gate arrays (FPGA).

重要

群集缩放并非通过 Azure 机器学习 SDK 提供。Cluster scaling is not provided through the Azure Machine Learning SDK. 若要详细了解如何缩放 AKS 群集中的节点,请参阅For more information on scaling the nodes in an AKS cluster, see

部署到 Azure Kubernetes 服务时,将部署到连接到工作区的 AKS 群集____。When deploying to Azure Kubernetes Service, you deploy to an AKS cluster that is connected to your workspace. 有两种方法可将 AKS 群集连接到工作区:There are two ways to connect an AKS cluster to your workspace:

  • 使用 Azure 机器学习 SDK、机器学习 CLI 或 Azure 机器学习工作室创建 AKS 群集。Create the AKS cluster using the Azure Machine Learning SDK, the Machine Learning CLI, or Azure Machine Learning studio. 此过程会自动将群集连接到工作区。This process automatically connects the cluster to the workspace.
  • 将现有的 AKS 群集附加到 Azure 机器学习工作区。Attach an existing AKS cluster to your Azure Machine Learning workspace. 可使用 Azure 机器学习 SDK、机器学习 CLI 或 Azure 机器学习工作室来附加群集。A cluster can be attached using the Azure Machine Learning SDK, Machine Learning CLI, or Azure Machine Learning studio.

AKS 群集和 AML 工作区可以位于不同的资源组中。The AKS cluster and the AML workspace can be in different resource groups.

重要

创建或附加过程是一次性任务。The creation or attachment process is a one time task. 将 AKS 群集连接到工作区后,便可将其用于部署。Once an AKS cluster is connected to the workspace, you can use it for deployments. 如果不再需要 AKS 群集,可将其拆离或删除。You can detach or delete the AKS cluster if you no longer need it. 拆离或删除后,将无法再部署到该群集。Once detached or deleted, you will no longer be able to deploy to the cluster.

重要

建议在部署到 Web 服务之前先进行本地调试。有关详细信息,请参阅本地调试We recommend that you debug locally before deploying to the web service, for more information see Debug Locally

还可参阅 Azure 机器学习 - 部署到本地笔记本You can also refer to Azure Machine Learning - Deploy to Local Notebook

先决条件Prerequisites

  • Azure 机器学习工作区。An Azure Machine Learning workspace. 有关详细信息,请参阅创建 Azure 机器学习工作区For more information, see Create an Azure Machine Learning workspace.

  • 工作区中注册的机器学习模型。A machine learning model registered in your workspace. 如果尚未注册模型,请参阅部署模型的方式和位置If you don't have a registered model, see How and where to deploy models.

  • 机器学习服务的 Azure CLI 扩展Azure 机器学习 Python SDKAzure 机器学习 Visual Studio Code 扩展The Azure CLI extension for Machine Learning service, Azure Machine Learning Python SDK, or the Azure Machine Learning Visual Studio Code extension.

  • 本文中的 Python 代码片段假设设置了以下变量____:The Python code snippets in this article assume that the following variables are set:

    • ws - 设置为工作区。ws - Set to your workspace.
    • model - 设置为注册的模型。model - Set to your registered model.
    • inference_config - 设置为模型的推理配置。inference_config - Set to the inference configuration for the model.

    有关如何设置这些变量的详细信息,请参阅部署模型的方式和位置For more information on setting these variables, see How and where to deploy models.

  • 本文中的 CLI 片段假设已创建 inferenceconfig.json 文档____。The CLI snippets in this article assume that you've created an inferenceconfig.json document. 有关如何创建此文档的详细信息,请参阅部署模型的方式和位置For more information on creating this document, see How and where to deploy models.

  • 如果群集中需要部署的是标准负载均衡器 (SLB),而不是基本负载均衡器 (BLB),请在 AKS 门户/CLI/SDK 中创建群集,然后将该群集附加到 AML 工作区。If you need a Standard Load Balancer(SLB) deployed in your cluster instead of a Basic Load Balancer(BLB), create a cluster in the AKS portal/CLI/SDK and then attach it to the AML workspace.

  • 如果你的 Azure Policy 限制创建公共 IP,创建 AKS 群集将会失败。If you have an Azure Policy that restricts the creation of Public IP's, then AKS cluster creation will fail. AKS 需要一个公共 IP 用于出口流量AKS requires a Public IP for egress traffic. 本文还指导如何通过公共 IP(几个 FQDN 的 IP 除外)锁定来自群集的出口流量。This article also provides guidance to lockdown egress traffic from the cluster through the Public IP except for a few FQDN's. 启用公共 IP 有两种方法:There are 2 ways to enable a Public IP:

    • 群集可以使用在默认情况下与 BLB 或 SLB 一起创建的公共 IP,或者The cluster can use the Public IP created by default with the BLB or SLB, Or
    • 可以在没有公共 IP 的情况下创建群集,然后为公共 IP 配置一个带有用户定义的路由的防火墙,如此处所述The cluster can be created without a Public IP and then a Public IP is configured with a firewall with a user defined route as documented here

    AML 控制平面不会与此公共 IP 通信。The AML control plane does not talk to this Public IP. 它与 AKS 控制平面通信以便进行部署。It talks to the AKS control plane for deployments.

  • 如果附加 AKS 群集(已启用授权 IP 范围以访问 API 服务器),请为该 AKS 群集启用 AML 控制平面 IP 范围。If you attach an AKS cluster, which has an authorized IP range enabled to access the API server, enable the AML contol plane IP ranges for the AKS cluster. AML 控制平面是跨配对区域部署的,并且会在 AKS 群集上部署推理 Pod。The AML control plane is deployed across paired regions and deploys inferencing pods on the AKS cluster. 如果没有 API 服务器的访问权限,则无法部署推理 Pod。Without access to the API server, the inferencing pods cannot be deployed. 在 AKS 群集中启用 IP 范围时,请对两个配对区域都使用 IP 范围Use the IP ranges for both the paired regions when enabling the IP ranges in an AKS cluster.

    授权 IP 范围仅适用于标准负载均衡器。Authroized IP ranges only works with Standard Load Balancer.

  • 计算名称在工作区内必须唯一Compute name MUST be unique within a workspace

    • 名称是必须提供的,且长度必须介于 3 到 24 个字符之间。Name is required and must be between 3 to 24 characters long.
    • 有效字符为大小写字母、数字和 - 字符。Valid characters are upper and lower case letters, digits, and the - character.
    • 名称必须以字母开头Name must start with a letter
    • 名称必须在 Azure 区域内的全部现有计算中都是唯一的。Name needs to be unique across all existing computes within an Azure region. 如果选择的名称不是唯一的,则会显示警报You will see an alert if the name you choose is not unique
  • 如果要将模型部署到 GPU 节点或 FPGA 节点(或任何特定 SKU),则必须使用该特定 SKU 创建群集。If you want to deploy models to GPU nodes or FPGA nodes (or any specific SKU), then you must create a cluster with the specific SKU. 不支持在现有群集中创建辅助节点池以及在辅助节点池中部署模型。There is no support for creating a secondary node pool in an existing cluster and deploying models in the secondary node pool.

创建新的 AKS 群集Create a new AKS cluster

时间估计:大约 10 分钟。Time estimate: Approximately 10 minutes.

对于工作区而言,创建或附加 AKS 群集是一次性过程。Creating or attaching an AKS cluster is a one time process for your workspace. 可以将此群集重复用于多个部署。You can reuse this cluster for multiple deployments. 如果删除该群集或包含该群集的资源组,则在下次需要进行部署时必须创建新群集。If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy. 可将多个 AKS 群集附加到工作区。You can have multiple AKS clusters attached to your workspace.

提示

如果要使用 Azure 虚拟网络保护 AKS 群集,则必须先创建虚拟网络。If you want to secure your AKS cluster using an Azure Virtual Network, you must create the virtual network first. 有关详细信息,请参阅 Azure 虚拟网络中的安全试验和推理For more information, see Secure experimentation and inference with Azure Virtual Network.

如果要创建 AKS 群集以用于开发、验证和测试而非生产,则可以将“群集用途”指定为“开发测试”____ ____ ____ ____ ____。If you want to create an AKS cluster for development, validation, and testing instead of production, you can specify the cluster purpose to dev test.

警告

如果设置了 cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST,则所创建的群集不适用于生产级别的流量,并且可能会增加推理时间。If you set cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST, the cluster that is created is not suitable for production level traffic and may increase inference times. 开发/测试群集也不保证容错能力。Dev/test clusters also do not guarantee fault tolerance. 对于开发/测试群集,建议至少拥有 2 个虚拟 CPU。We recommend at least 2 virtual CPUs for dev/test clusters.

以下示例演示如何使用 SDK 和 CLI 创建新的 AKS 群集:The following examples demonstrate how to create a new AKS cluster using the SDK and CLI:

使用 SDKUsing the SDK

from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (you can also provide parameters to customize this).
# For example, to create a dev/test cluster, use:
# prov_config = AksCompute.provisioning_configuration(cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)
prov_config = AksCompute.provisioning_configuration()
# Example configuration to use an existing virtual network
# prov_config.vnet_name = "mynetwork"
# prov_config.vnet_resourcegroup_name = "mygroup"
# prov_config.subnet_name = "default"
# prov_config.service_cidr = "10.0.0.0/16"
# prov_config.dns_service_ip = "10.0.0.10"
# prov_config.docker_bridge_cidr = "172.17.0.1/16"

aks_name = 'myaks'
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws,
                                    name = aks_name,
                                    provisioning_configuration = prov_config)

# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)

重要

对于 provisioning_configuration(),如果为 agent_countvm_size 选择自定义值,并且 cluster_purpose 不是 DEV_TEST,则需要确保 agent_count 乘以 vm_size 的结果大于或等于 12 个虚拟 CPU。For provisioning_configuration(), if you pick custom values for agent_count and vm_size, and cluster_purpose is not DEV_TEST, then you need to make sure agent_count multiplied by vm_size is greater than or equal to 12 virtual CPUs. 例如,如果对 vm_size 使用“Standard_D3_v2”(拥有 4 个虚拟 CPU),则应该为 agent_count 选择 3 或更大的数字。For example, if you use a vm_size of "Standard_D3_v2", which has 4 virtual CPUs, then you should pick an agent_count of 3 or greater.

Azure 机器学习 SDK 不支持缩放 AKS 群集。The Azure Machine Learning SDK does not provide support scaling an AKS cluster. 要缩放群集中的节点,请在 Azure 机器学习工作室中使用 AKS 群集的 UI。To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure Machine Learning studio. 只能更改节点计数,不能更改群集的 VM 大小。You can only change the node count, not the VM size of the cluster.

有关此示例中使用的类、方法和参数的详细信息,请参阅以下参考文档:For more information on the classes, methods, and parameters used in this example, see the following reference documents:

使用 CLIUsing the CLI

az ml computetarget create aks -n myaks

有关详细信息,请参阅 az ml computetarget create aks 参考文档。For more information, see the az ml computetarget create aks reference.

附加现有的 AKS 群集Attach an existing AKS cluster

时间估计:大约 5 分钟。Time estimate: Approximately 5 minutes.

如果 Azure 订阅中已有 AKS 群集并且其版本为 1.17 或更低版本,则可以使用该群集来部署映像。If you already have AKS cluster in your Azure subscription, and it is version 1.17 or lower, you can use it to deploy your image.

提示

现有的 AKS 群集除了位于 Azure 机器学习工作区,还可位于 Azure 区域中。The existing AKS cluster can be in a Azure region other than your Azure Machine Learning workspace.

如果要使用 Azure 虚拟网络保护 AKS 群集,则必须先创建虚拟网络。If you want to secure your AKS cluster using an Azure Virtual Network, you must create the virtual network first. 有关详细信息,请参阅 Azure 虚拟网络中的安全试验和推理For more information, see Secure experimentation and inference with Azure Virtual Network.

将 AKS 群集附加到工作区时,可以通过设置 cluster_purpose 参数来定义使用群集的方式。When attaching an AKS cluster to a workspace, you can define how you will use the cluster by setting the cluster_purpose parameter.

如果未设置 cluster_purpose 参数或设置了 cluster_purpose = AksCompute.ClusterPurpose.FAST_PROD,则群集必须至少具有 12 个可用的虚拟 CPU。If you do not set the cluster_purpose parameter, or set cluster_purpose = AksCompute.ClusterPurpose.FAST_PROD, then the cluster must have at least 12 virtual CPUs available.

如果设置了 cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST,则群集不必具有 12 个虚拟 CPU。If you set cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST, then the cluster does not need to have 12 virtual CPUs. 对于开发/测试,建议至少具有 2 个虚拟 CPU。We recommend at least 2 virtual CPUs for dev/test. 但是,针对开发/测试配置的群集不适用于生产级别的流量,并且可能会增加推理时间。However a cluster that is configured for dev/test is not suitable for production level traffic and may increase inference times. 开发/测试群集也不保证容错能力。Dev/test clusters also do not guarantee fault tolerance.

警告

请勿在工作区中为同一 AKS 群集创建多个同步附件。Do not create multiple, simultaneous attachments to the same AKS cluster from your workspace. 例如,使用两个不同的名称将一个 AKS 群集附加到工作区。For example, attaching one AKS cluster to a workspace using two different names. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

如果要重新附加 AKS 群集(例如,更改 TLS 或其他群集配置设置),则必须先使用 AksCompute.detach() 删除现有附件。If you want to re-attach an AKS cluster, for example to change TLS or other cluster configuration setting, you must first remove the existing attachment by using AksCompute.detach().

有关如何使用 Azure CLI 或门户创建 AKS 群集的详细信息,请参阅以下文章:For more information on creating an AKS cluster using the Azure CLI or portal, see the following articles:

以下示例演示如何将现有 AKS 群集附加到工作区:The following examples demonstrate how to attach an existing AKS cluster to your workspace:

使用 SDKUsing the SDK

from azureml.core.compute import AksCompute, ComputeTarget
# Set the resource group that contains the AKS cluster and the cluster name
resource_group = 'myresourcegroup'
cluster_name = 'myexistingcluster'

# Attach the cluster to your workgroup. If the cluster has less than 12 virtual CPUs, use the following instead:
# attach_config = AksCompute.attach_configuration(resource_group = resource_group,
#                                         cluster_name = cluster_name,
#                                         cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)
attach_config = AksCompute.attach_configuration(resource_group = resource_group,
                                         cluster_name = cluster_name)
aks_target = ComputeTarget.attach(ws, 'myaks', attach_config)

# Wait for the attach process to complete
aks_target.wait_for_completion(show_output = True)

有关此示例中使用的类、方法和参数的详细信息,请参阅以下参考文档:For more information on the classes, methods, and parameters used in this example, see the following reference documents:

使用 CLIUsing the CLI

要使用 CLI 附加现有群集,需要获取现有群集的资源 ID。To attach an existing cluster using the CLI, you need to get the resource ID of the existing cluster. 请使用以下命令要获取该值。To get this value, use the following command. myexistingcluster 替换为 AKS 群集的名称。Replace myexistingcluster with the name of your AKS cluster. myresourcegroup 替换为包含该群集的资源组:Replace myresourcegroup with the resource group that contains the cluster:

az aks show -n myexistingcluster -g myresourcegroup --query id

此命令返回类似于以下文本的值:This command returns a value similar to the following text:

/subscriptions/{GUID}/resourcegroups/{myresourcegroup}/providers/Microsoft.ContainerService/managedClusters/{myexistingcluster}

要将现有群集附加到工作区,请使用以下命令。To attach the existing cluster to your workspace, use the following command. aksresourceid 替换为上一命令返回的值。Replace aksresourceid with the value returned by the previous command. myresourcegroup 替换为包含工作区的资源组。Replace myresourcegroup with the resource group that contains your workspace. myworkspace 替换为工作区名称。Replace myworkspace with your workspace name.

az ml computetarget attach aks -n myaks -i aksresourceid -g myresourcegroup -w myworkspace

有关详细信息,请参阅 az ml computetarget attach aks 参考文档。For more information, see the az ml computetarget attach aks reference.

部署到 AKSDeploy to AKS

要将模型部署到 Azure Kubernetes 服务,请创建一个描述所需计算资源的部署配置____。To deploy a model to Azure Kubernetes Service, create a deployment configuration that describes the compute resources needed. 例如,核心和内存的数量。For example, number of cores and memory. 此外,还需要一个推理配置,描述托管模型和 Web 服务所需的环境____。You also need an inference configuration, which describes the environment needed to host the model and web service. 有关如何创建推理配置的详细信息,请参阅部署模型的方式和位置For more information on creating the inference configuration, see How and where to deploy models.

备注

待部署模型的数量限制为每个部署(每个容器)1,000 个模型。The number of models to be deployed is limited to 1,000 models per deployment (per container).

使用 SDKUsing the SDK

from azureml.core.webservice import AksWebservice, Webservice
from azureml.core.model import Model

aks_target = AksCompute(ws,"myaks")
# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config, aks_target)
service.wait_for_deployment(show_output = True)
print(service.state)
print(service.get_logs())

有关此示例中使用的类、方法和参数的详细信息,请参阅以下参考文档:For more information on the classes, methods, and parameters used in this example, see the following reference documents:

使用 CLIUsing the CLI

要使用 CLI 进行部署,请使用以下命令。To deploy using the CLI, use the following command. myaks 替换为 AKS 计算目标的名称。Replace myaks with the name of the AKS compute target. mymodel:1 替换为注册的模型的名称和版本。Replace mymodel:1 with the name and version of the registered model. myservice 替换为要赋予此服务的名称:Replace myservice with the name to give this service:

az ml model deploy -ct myaks -m mymodel:1 -n myservice -ic inferenceconfig.json -dc deploymentconfig.json

deploymentconfig.json 文档中的条目对应于 AksWebservice.deploy_configuration 的参数。The entries in the deploymentconfig.json document map to the parameters for AksWebservice.deploy_configuration. 下表描述了 JSON 文档中的实体与方法参数之间的映射:The following table describes the mapping between the entities in the JSON document and the parameters for the method:

JSON 实体JSON entity 方法参数Method parameter 说明Description
computeType 不可用NA 计算目标。The compute target. 对于 AKS,此值必须为 aksFor AKS, the value must be aks.
autoScaler 不可用NA 包含自动缩放的配置元素。Contains configuration elements for autoscale. 请参阅自动缩放程序表。See the autoscaler table.
  autoscaleEnabled autoscale_enabled 是否为 Web 服务启用自动缩放。Whether to enable autoscaling for the web service. 如果 numReplicas = 0,则为 True;否则为 FalseIf numReplicas = 0, True; otherwise, False.
  minReplicas autoscale_min_replicas 自动缩放此 Web 服务时可使用的容器的最小数目。The minimum number of containers to use when autoscaling this web service. 默认值为 1Default, 1.
  maxReplicas autoscale_max_replicas 自动缩放此 Web 服务时可使用的容器的最大数目。The maximum number of containers to use when autoscaling this web service. 默认值为 10Default, 10.
  refreshPeriodInSeconds autoscale_refresh_seconds 自动缩放程序尝试缩放此 Web 服务的频率。How often the autoscaler attempts to scale this web service. 默认值为 1Default, 1.
  targetUtilization autoscale_target_utilization 自动缩放程序应尝试维持的此 Web 服务的目标利用率(以低于 100 的百分比表示)。The target utilization (in percent out of 100) that the autoscaler should attempt to maintain for this web service. 默认值为 70Default, 70.
dataCollection 不可用NA 包含数据集合的配置元素。Contains configuration elements for data collection.
  storageEnabled collect_model_data 是否为 Web 服务启用模型数据收集。Whether to enable model data collection for the web service. 默认值为 FalseDefault, False.
authEnabled auth_enabled 是否为 Web 服务启用密钥身份验证。Whether or not to enable key authentication for the web service. tokenAuthEnabledauthEnabled 均不能为 TrueBoth tokenAuthEnabled and authEnabled cannot be True. 默认值为 TrueDefault, True.
tokenAuthEnabled token_auth_enabled 是否为 Web 服务启用令牌身份验证。Whether or not to enable token authentication for the web service. tokenAuthEnabledauthEnabled 均不能为 TrueBoth tokenAuthEnabled and authEnabled cannot be True. 默认值为 FalseDefault, False.
containerResourceRequirements 不可用NA CPU 和内存实体的容器。Container for the CPU and memory entities.
  cpu cpu_cores 要分配给此 Web 服务的 CPU 核心数。The number of CPU cores to allocate for this web service. 默认值为 0.1Defaults, 0.1
  memoryInGB memory_gb 为此 Web 服务分配的内存量 (GB)。The amount of memory (in GB) to allocate for this web service. 默认值为 0.5Default, 0.5
appInsightsEnabled enable_app_insights 是否为 Web 服务启用 Application Insights 日志记录。Whether to enable Application Insights logging for the web service. 默认值为 FalseDefault, False.
scoringTimeoutMs scoring_timeout_ms 对 Web 服务调用的评分强制执行的超时时间。A timeout to enforce for scoring calls to the web service. 默认值为 60000Default, 60000.
maxConcurrentRequestsPerContainer replica_max_concurrent_requests 此 Web 服务每个节点的最大并发请求数。The maximum concurrent requests per node for this web service. 默认值为 1Default, 1.
maxQueueWaitMs max_request_wait_time 在返回 503 错误之前,请求在队列中停留的最长时间(毫秒)。The maximum time a request will stay in thee queue (in milliseconds) before a 503 error is returned. 默认值为 500Default, 500.
numReplicas num_replicas 要分配给此 Web 服务的容器数量。The number of containers to allocate for this web service. 没有默认值。No default value. 如果未设置此参数,则默认启用自动缩放程序。If this parameter is not set, the autoscaler is enabled by default.
keys 不可用NA 包含密钥的配置元素。Contains configuration elements for keys.
  primaryKey primary_key 要用于此 Web 服务的主要身份验证密钥A primary auth key to use for this Webservice
  secondaryKey secondary_key 要用于此 Web 服务的辅助身份验证密钥A secondary auth key to use for this Webservice
gpuCores gpu_cores 要分配给此 Web 服务的 GPU 核心数。The number of GPU cores to allocate for this Webservice. 默认值为 1。Default is 1. 仅支持整数值。Only supports whole number values.
livenessProbeRequirements 不可用NA 包含运行情况探测要求的配置元素。Contains configuration elements for liveness probe requirements.
  periodSeconds period_seconds 执行运行情况探测的频率(秒)。How often (in seconds) to perform the liveness probe. 默认值为 10 秒。Default to 10 seconds. 最小值为 1。Minimum value is 1.
  initialDelaySeconds initial_delay_seconds 启动容器后,启动运行情况探测前的秒数。Number of seconds after the container has started before liveness probes are initiated. 默认值为 310Defaults to 310
  timeoutSeconds timeout_seconds 运行情况探测超时前等待的秒数。默认值为 2 秒。Number of seconds after which the liveness probe times out. Defaults to 2 seconds. 最小值为 1Minimum value is 1
  successThreshold success_threshold 运行情况探测失败后,将其视为成功所需的最小连续成功次数。Minimum consecutive successes for the liveness probe to be considered successful after having failed. 默认值为 1。Defaults to 1. 最小值为 1。Minimum value is 1.
  failureThreshold failure_threshold 当 Pod 启动而运行情况探测失败时,Kubernetes 将尝试 failureThreshold 次才会放弃。When a Pod starts and the liveness probe fails, Kubernetes will try failureThreshold times before giving up. 默认值为 3。Defaults to 3. 最小值为 1。Minimum value is 1.
namespace namespace 将 Web 服务部署到的 Kubernetes 命名空间。The Kubernetes namespace that the webservice is deployed into. 最多 63 个字符,可使用小写字母数字字符(“a”-“z”,“0”-“9”)和连字符(“-”)。Up to 63 lowercase alphanumeric ('a'-'z', '0'-'9') and hyphen ('-') characters. 第一个和最后一个字符不能为连字符。The first and last characters can't be hyphens.

以下 JSON 是用于 CLI 的部署配置示例:The following JSON is an example deployment configuration for use with the CLI:

{
    "computeType": "aks",
    "autoScaler":
    {
        "autoscaleEnabled": true,
        "minReplicas": 1,
        "maxReplicas": 3,
        "refreshPeriodInSeconds": 1,
        "targetUtilization": 70
    },
    "dataCollection":
    {
        "storageEnabled": true
    },
    "authEnabled": true,
    "containerResourceRequirements":
    {
        "cpu": 0.5,
        "memoryInGB": 1.0
    }
}

有关详细信息,请参阅 az ml model deploy 参考文档。For more information, see the az ml model deploy reference.

使用 VS CodeUsing VS Code

有关如何使用 VS Code 的信息,请参阅通过 VS Code 扩展部署到 AKSFor information on using VS Code, see deploy to AKS via the VS Code extension.

重要

通过 VS Code 进行部署要求提前创建 AKS 群集或将其附加到工作区。Deploying through VS Code requires the AKS cluster to be created or attached to your workspace in advance.

了解部署过程Understand the deployment processes

在 Kubernetes 和 Azure 机器学习中都会用到“部署”一词。The word "deployment" is used in both Kubernetes and Azure Machine Learning. “部署”在这两种上下文中有不同的含义。"Deployment" has different meanings in these two contexts. 在 Kubernetes 中,Deployment 是使用声明性 YAML 文件指定的具体实体。In Kubernetes, a Deployment is a concrete entity, specified with a declarative YAML file. Kubernetes Deployment 具有明确的生命周期,并与其他 Kubernetes 实体(如 PodsReplicaSets)有具体的关系。A Kubernetes Deployment has a defined lifecycle and concrete relationships to other Kubernetes entities such as Pods and ReplicaSets. 可以从什么是 Kubernetes?中的文档和视频了解 Kubernetes。You can learn about Kubernetes from docs and videos at What is Kubernetes?.

在 Azure 机器学习中,“部署”在更普遍的意义上用于提供和清理项目资源。In Azure Machine Learning, "deployment" is used in the more general sense of making available and cleaning up your project resources. Azure 机器学习认为属于部署的步骤包括:The steps that Azure Machine Learning considers part of deployment are:

  1. 将项目文件夹中的文件压缩,忽略那些在 .amlignore 或 .gitignore 中指定的文件Zipping the files in your project folder, ignoring those specified in .amlignore or .gitignore
  2. 纵向扩展计算群集(与 Kubernetes 相关)Scaling up your compute cluster (Relates to Kubernetes)
  3. 构建 dockerfile 或将其下载到计算节点(与 Kubernetes 相关)Building or downloading the dockerfile to the compute node (Relates to Kubernetes)
    1. 系统计算以下各项的哈希:The system calculates a hash of:
    2. 在工作区 Azure 容器注册表 (ACR) 中进行查找时,系统使用此哈希作为键The system uses this hash as the key in a lookup of the workspace Azure Container Registry (ACR)
    3. 如果找不到该键,系统会在全局 ACR 中寻找匹配项If it is not found, it looks for a match in the global ACR
    4. 如果找不到匹配项,系统会生成新映像(该映像将会被缓存并注册到工作区 ACR 中)If it is not found, the system builds a new image (which will be cached and registered with the workspace ACR)
  4. 将压缩的项目文件下载到计算节点上的临时存储Downloading your zipped project file to temporary storage on the compute node
  5. 将项目文件解压缩Unzipping the project file
  6. 计算节点执行 python <entry script> <arguments>The compute node executing python <entry script> <arguments>
  7. 将写入 ./outputs 的日志、模型文件和其他文件保存到与工作区关联的存储帐户Saving logs, model files, and other files written to ./outputs to the storage account associated with the workspace
  8. 纵向缩减计算,包括删除临时存储(与 Kubernetes 相关)Scaling down compute, including removing temporary storage (Relates to Kubernetes)

使用 AKS 时,Kubernetes 使用按上述方法生成或找到的 dockerfile 来控制计算的纵向扩展和缩减。When you're using AKS, the scaling up and down of the compute is controlled by Kubernetes, using the dockerfile built or found as described above.

使用受控推出(预览版)将模型部署到 AKSDeploy models to AKS using controlled rollout (preview)

使用终结点以受控的方式分析和提升模型版本。Analyze and promote model versions in a controlled fashion using endpoints. 最多可以在一个终结点后方部署六个版本。You can deploy up to six versions behind a single endpoint. 终结点提供以下功能:Endpoints provide the following capabilities:

  • 配置__发送到每个终结点的评分流量百分比__。Configure the percentage of scoring traffic sent to each endpoint. 例如,将 20% 的流量路由到终结点“test”,将 80% 路由到“production”。For example, route 20% of the traffic to endpoint 'test' and 80% to 'production'.

    备注

    如果不按 100% 的流量计算,则所有剩余百分比的流量将路由到默认终结点版本____。If you do not account for 100% of the traffic, any remaining percentage is routed to the default endpoint version. 例如,如果将终结点版本“test”配置为获取 10% 的流量,将“prod”配置为 30%,则剩余的 60% 将发送到默认终结点版本。For example, if you configure endpoint version 'test' to get 10% of the traffic, and 'prod' for 30%, the remaining 60% is sent to the default endpoint version.

    创建的第一个终结点版本将自动配置为默认版本。The first endpoint version created is automatically configured as the default. 可通过在创建或更新终结点版本时设置 is_default=True 来更改此设置。You can change this by setting is_default=True when creating or updating an endpoint version.

  • 将终结点版本标记为“对照”或“实验”____ ____。Tag an endpoint version as either control or treatment. 例如,当前的生产终结点版本可能为“对照”版本,而可能的新模型将部署为“实验”版本。For example, the current production endpoint version might be the control, while potential new models are deployed as treatment versions. 评估“实验”版本的性能后,如果该版本优于当前的“对照”版本,则其可能会提升为新的生产/对照版本。After evaluating performance of the treatment versions, if one outperforms the current control, it might be promoted to the new production/control.

    备注

    只能有一个“对照”版本____。You can only have one control. 可以有多个“实验”版本。You can have multiple treatments.

可以启用 App Insights 来查看终结点和已部署版本的操作指标。You can enable app insights to view operational metrics of endpoints and deployed versions.

创建终结点Create an endpoint

做好部署模型的准备后,请创建一个评分终结点,并部署第一个版本。Once you are ready to deploy your models, create a scoring endpoint and deploy your first version. 以下示例演示如何使用 SDK 部署和创建终结点。The following example shows how to deploy and create the endpoint using the SDK. 将第一个部署定义为默认版本,这意味着所有版本中未指定的百分比的流量都将流向默认版本。The first deployment will be defined as the default version, which means that unspecified traffic percentile across all versions will go to the default version.

提示

在下面的示例中,所作配置将初始终结点版本设置为处理 20% 的流量。In the following example, the configuration sets the initial endpoint version to handle 20% of the traffic. 由于这是第一个终结点,因此它也是默认版本。Since this is the first endpoint, it's also the default version. 而且,由于我们没有用于处理其余 80% 流量的其他版本,因此这些流量也将其路由到默认版本。And since we don't have any other versions for the other 80% of traffic, it is routed to the default as well. 在部署了可处理一定百分比流量的其他版本以前,此版本实际将接收 100% 的流量。Until other versions that take a percentage of traffic are deployed, this one effectively receives 100% of the traffic.

import azureml.core,
from azureml.core.webservice import AksEndpoint
from azureml.core.compute import AksCompute
from azureml.core.compute import ComputeTarget
# select a created compute
compute = ComputeTarget(ws, 'myaks')
namespace_name= endpointnamespace
# define the endpoint and version name
endpoint_name = "mynewendpoint"
version_name= "versiona"
# create the deployment config and define the scoring traffic percentile for the first deployment
endpoint_deployment_config = AksEndpoint.deploy_configuration(cpu_cores = 0.1, memory_gb = 0.2,
                                                              enable_app_insights = True,
                                                              tags = {'sckitlearn':'demo'},
                                                              description = "testing versions",
                                                              version_name = version_name,
                                                              traffic_percentile = 20)
 # deploy the model and endpoint
 endpoint = Model.deploy(ws, endpoint_name, [model], inference_config, endpoint_deployment_config, compute)
 # Wait for he process to complete
 endpoint.wait_for_deployment(True)

更新版本并将其添加到终结点Update and add versions to an endpoint

将其他版本添加到终结点,并配置流向该版本的评分流量的百分比。Add another version to your endpoint and configure the scoring traffic percentile going to the version. 有两种类型的版本:控制版本和处理版本。There are two types of versions, a control and a treatment version. 可设置多个“实验”版本来帮助进行与单个“对照”版本之间的比较。There can be multiple treatment versions to help compare against a single control version.

提示

由以下代码段创建的第二个版本可接受 10% 的流量。The second version, created by the following code snippet, accepts 10% of traffic. 第一个版本配置为 20%,因此总共仅为特定版本配置了 30% 的流量。The first version is configured for 20%, so only 30% of the traffic is configured for specific versions. 剩余的 70% 将发送到第一个终结点版本,因为它也是默认版本。The remaining 70% is sent to the first endpoint version, because it is also the default version.

from azureml.core.webservice import AksEndpoint

# add another model deployment to the same endpoint as above
version_name_add = "versionb"
endpoint.create_version(version_name = version_name_add,
                       inference_config=inference_config,
                       models=[model],
                       tags = {'modelVersion':'b'},
                       description = "my second version",
                       traffic_percentile = 10)
endpoint.wait_for_deployment(True)

更新或删除终结点中的现有版本。Update existing versions or delete them in an endpoint. 可更改版本的默认类型、控件类型和流量百分比。You can change the version's default type, control type, and the traffic percentile. 在下面的示例中,第二个版本会将其流量增加到 40% 且其现在为默认版本。In the following example, the second version increases its traffic to 40% and is now the default.

提示

运行以下代码段之后,现在第二个版本变为默认版本。After the following code snippet, the second version is now default. 它现在配置为 40%,而原始版本仍配置为 20%。It is now configured for 40%, while the original version is still configured for 20%. 这意味着,还有 40% 的流量未计入版本配置。This means that 40% of traffic is not accounted for by version configurations. 剩余的流量将路由到第二个版本,因为它现在为默认版本。The leftover traffic will be routed to the second version, because it is now default. 它实际上接收了 80% 的流量。It effectively receives 80% of the traffic.

from azureml.core.webservice import AksEndpoint

# update the version's scoring traffic percentage and if it is a default or control type
endpoint.update_version(version_name=endpoint.versions["versionb"].name,
                       description="my second version update",
                       traffic_percentile=40,
                       is_default=True,
                       is_control_version_type=True)
# Wait for the process to complete before deleting
endpoint.wait_for_deployment(true)
# delete a version in an endpoint
endpoint.delete_version(version_name="versionb")

Web 服务身份验证Web service authentication

部署到 Azure Kubernetes 服务时,默认会启用基于密钥的身份验证____。When deploying to Azure Kubernetes Service, key-based authentication is enabled by default. 此外,还可以启用基于令牌的身份验证____。You can also enable token-based authentication. 基于令牌的身份验证要求客户端使用 Azure Active Directory 帐户来请求身份验证令牌,该令牌用于向已部署的服务发出请求。Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service.

要禁用身份验证,请在创建部署配置时设置 auth_enabled=False 参数____。To disable authentication, set the auth_enabled=False parameter when creating the deployment configuration. 下面的示例使用 SDK 来禁用身份验证:The following example disables authentication using the SDK:

deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, auth_enabled=False)

有关如何从客户端应用程序进行身份验证的信息,请参阅使用部署为 Web 服务的 Azure 机器学习模型For information on authenticating from a client application, see the Consume an Azure Machine Learning model deployed as a web service.

使用密钥进行身份验证Authentication with keys

如果已启用密钥身份验证,可以使用 get_keys 方法来检索主要和辅助身份验证密钥:If key authentication is enabled, you can use the get_keys method to retrieve a primary and secondary authentication key:

primary, secondary = service.get_keys()
print(primary)

重要

如需重新生成密钥,请使用 service.regen_keyIf you need to regenerate a key, use service.regen_key

使用令牌进行身份验证Authentication with tokens

要启用令牌身份验证,请在创建或更新部署时设置 token_auth_enabled=True 参数。To enable token authentication, set the token_auth_enabled=True parameter when you are creating or updating a deployment. 下面的示例使用 SDK 来启用令牌身份验证:The following example enables token authentication using the SDK:

deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, token_auth_enabled=True)

如果启用了令牌身份验证,可以使用 get_token 方法来检索 JWT 令牌以及该令牌的到期时间:If token authentication is enabled, you can use the get_token method to retrieve a JWT token and that token's expiration time:

token, refresh_by = service.get_token()
print(token)

重要

需要在令牌的 refresh_by 时间后请求一个新令牌。You will need to request a new token after the token's refresh_by time.

Microsoft 强烈建议在 Azure Kubernetes 服务群集所在的相同区域中创建 Azure 机器学习工作区。Microsoft strongly recommends that you create your Azure Machine Learning workspace in the same region as your Azure Kubernetes Service cluster. 要使用令牌进行身份验证,Web 服务将调用创建 Azure 机器学习工作区的区域。To authenticate with a token, the web service will make a call to the region in which your Azure Machine Learning workspace is created. 如果工作区区域不可用,即使群集和工作区不在同一区域,也将无法获取 Web 服务的令牌。If your workspace's region is unavailable, then you will not be able to fetch a token for your web service even, if your cluster is in a different region than your workspace. 这实际上会导致在工作区的区域再次可用之前,基于令牌的身份验证不可用。This effectively results in Token-based Authentication being unavailable until your workspace's region is available again. 此外,群集区域和工作区区域的距离越远,获取令牌所需的时间就越长。In addition, the greater the distance between your cluster's region and your workspace's region, the longer it will take to fetch a token.

若要检索令牌,必须使用 Azure 机器学习 SDK 或 az ml service get-access-token 命令。To retrieve a token, you must use the Azure Machine Learning SDK or the az ml service get-access-token command.

后续步骤Next steps