将模型部署到 Azure Kubernetes 服务群集Deploy a model to an Azure Kubernetes Service cluster

了解如何使用 Azure 机器学习将模型部署为 Azure Kubernetes 服务 (AKS) 中的 Web 服务。Learn how to use Azure Machine Learning to deploy a model as a web service on Azure Kubernetes Service (AKS). Azure Kubernetes 服务适用于大规模的生产部署。Azure Kubernetes Service is good for high-scale production deployments. 如果需要以下一项或多项功能,请使用 Azure Kubernetes 服务:Use Azure Kubernetes service if you need one or more of the following capabilities:

  • 快速响应时间Fast response time
  • 自动缩放已部署的服务Autoscaling of the deployed service
  • LoggingLogging
  • 模型数据集合Model data collection
  • 身份验证Authentication
  • TLS 终止TLS termination
  • 硬件加速选项,例如 GPU 和现场可编程门阵列 (FPGA)Hardware acceleration options such as GPU and field-programmable gate arrays (FPGA)

部署到 Azure Kubernetes 服务时,将部署到连接到工作区的 AKS 群集。When deploying to Azure Kubernetes Service, you deploy to an AKS cluster that is connected to your workspace. 有关将 AKS 群集连接到工作区的信息,请参阅创建并附加 Azure Kubernetes 服务群集For information on connecting an AKS cluster to your workspace, see Create and attach an Azure Kubernetes Service cluster.

重要

建议在部署到 Web 服务之前先进行本地调试。有关详细信息,请参阅本地调试We recommend that you debug locally before deploying to the web service, for more information see Debug Locally

还可参阅 Azure 机器学习 - 部署到本地笔记本You can also refer to Azure Machine Learning - Deploy to Local Notebook

先决条件Prerequisites

了解部署过程Understand the deployment processes

在 Kubernetes 和 Azure 机器学习中都会用到“部署”一词。The word "deployment" is used in both Kubernetes and Azure Machine Learning. “部署”在这两种上下文中有不同的含义。"Deployment" has different meanings in these two contexts. 在 Kubernetes 中,Deployment 是使用声明性 YAML 文件指定的具体实体。In Kubernetes, a Deployment is a concrete entity, specified with a declarative YAML file. Kubernetes Deployment 具有明确的生命周期,并与其他 Kubernetes 实体(如 PodsReplicaSets)有具体的关系。A Kubernetes Deployment has a defined lifecycle and concrete relationships to other Kubernetes entities such as Pods and ReplicaSets. 可以从什么是 Kubernetes?中的文档和视频了解 Kubernetes。You can learn about Kubernetes from docs and videos at What is Kubernetes?.

在 Azure 机器学习中,“部署”在更普遍的意义上用于提供和清理项目资源。In Azure Machine Learning, "deployment" is used in the more general sense of making available and cleaning up your project resources. Azure 机器学习认为属于部署的步骤包括:The steps that Azure Machine Learning considers part of deployment are:

  1. 将项目文件夹中的文件压缩,忽略那些在 .amlignore 或 .gitignore 中指定的文件Zipping the files in your project folder, ignoring those specified in .amlignore or .gitignore
  2. 纵向扩展计算群集(与 Kubernetes 相关)Scaling up your compute cluster (Relates to Kubernetes)
  3. 构建 dockerfile 或将其下载到计算节点(与 Kubernetes 相关)Building or downloading the dockerfile to the compute node (Relates to Kubernetes)
    1. 系统计算以下各项的哈希:The system calculates a hash of:
    2. 在工作区 Azure 容器注册表 (ACR) 中查找时,系统使用此哈希作为键The system uses this hash as the key in a lookup of the workspace Azure Container Registry (ACR)
    3. 如果找不到,它会在全局 ACR 中寻找匹配项If it is not found, it looks for a match in the global ACR
    4. 如果找不到匹配项,系统会生成一个新映像(该映像会被缓存并推送到工作区 ACR 中)If it is not found, the system builds a new image (which will be cached and pushed to the workspace ACR)
  4. 将压缩的项目文件下载到计算节点上的临时存储Downloading your zipped project file to temporary storage on the compute node
  5. 解压缩项目文件Unzipping the project file
  6. 执行 python <entry script> <arguments> 的计算节点The compute node executing python <entry script> <arguments>
  7. 将写入 ./outputs 的日志、模型文件和其他文件保存到与工作区关联的存储帐户Saving logs, model files, and other files written to ./outputs to the storage account associated with the workspace
  8. 纵向缩减计算,包括删除临时存储(与 Kubernetes 相关)Scaling down compute, including removing temporary storage (Relates to Kubernetes)

Azure ML 路由器Azure ML router

将传入的推理请求路由到已部署服务的前端组件 (azureml-fe) 可根据需要自动缩放。The front-end component (azureml-fe) that routes incoming inference requests to deployed services automatically scales as needed. azureml-fe 的缩放基于 AKS 群集用途和大小(节点数)。Scaling of azureml-fe is based on the AKS cluster purpose and size (number of nodes). 群集用途和节点数是在创建或附加 AKS 群集时配置的。The cluster purpose and nodes are configured when you create or attach an AKS cluster. 每个群集都有一个 azureml-fe 服务,该服务可能在多个 Pod 上运行。There is one azureml-fe service per cluster, which may be running on multiple pods.

重要

当使用配置为“开发测试”的群集时,self-scaler 会被禁用。When using a cluster configured as dev-test, the self-scaler is disabled.

Azureml-fe 会纵向(垂直)扩展以使用更多的核心,并会横向(水平)扩展以使用更多的 Pod。Azureml-fe scales both up (vertically) to use more cores, and out (horizontally) to use more pods. 决定进行纵向扩展时,将使用对传入的推理请求进行路由所花费的时间。When making the decision to scale up, the time that it takes to route incoming inference requests is used. 如果此时间超过阈值,则会进行纵向扩展。If this time exceeds the threshold, a scale-up occurs. 如果对传入的请求进行路由的时间仍超出阈值,则会进行横向扩展。If the time to route incoming requests continues to exceed the threshold, a scale-out occurs.

在纵向和横向缩减时,将使用 CPU 使用率。When scaling down and in, CPU usage is used. 如果满足 CPU 使用率阈值,则前端会首先纵向缩减。If the CPU usage threshold is met, the front end will first be scaled down. 如果 CPU 使用率下降到了横向缩减阈值,则会进行横向缩减操作。If the CPU usage drops to the scale-in threshold, a scale-in operation happens. 仅当有足够的群集资源可用时,才会进行纵向扩展和横向扩展。Scaling up and out will only occur if there are enough cluster resources available.

了解 AKS 推理集群的连接要求Understand connectivity requirements for AKS inferencing cluster

当 Azure 机器学习创建或附加 AKS 群集时,将使用以下两种网络模型之一部署 AKS 群集:When Azure Machine Learning creates or attaches an AKS cluster, AKS cluster is deployed with one of the following two network models:

  • Kubenet 网络 - 通常在部署 AKS 群集时创建和配置网络资源。Kubenet networking - The network resources are typically created and configured as the AKS cluster is deployed.
  • Azure 容器网络接口 (CNI) 网络 - AKS 群集连接到现有的虚拟网络资源和配置。Azure Container Networking Interface (CNI) networking - The AKS cluster is connected to existing virtual network resources and configurations.

对于第一种网络模式,已为 Azure 机器学习服务正确创建和配置了网络。For the first network mode, networking is created and configured properly for Azure Machine Learning service. 对于第二种网络模式,由于群集已连接到现有虚拟网络(尤其是在将自定义 DNS 用于现有虚拟网络时),因此客户需要特别注意 AKS 推理群集的连接要求,并确保 DNS 解析和 AKS 推理的出站连接。For the second networking mode, since the cluster is connected to existing virtual network, especially when custom DNS is used for existing virtual network, customer needs to pay extra attention to connectivity requirements for AKS inferencing cluster and ensure DNS resolution and outbound connectivity for AKS inferencing.

下图捕获了 AKS 推理的所有连接要求。Following diagram captures all connectivity requirements for AKS inferencing. 黑色箭头代表实际的通信,蓝色箭头代表客户控制的 DNS 应该解析的域名。Black arrows represent actual communication, and blue arrows represent the domain names, that customer-controlled DNS should resolve.

AKS 推理的连接要求

总体 DNS 解析要求Overall DNS resolution requirements

现有 VNET 中的 DNS 解析由客户控制。DNS resolution within existing VNET is under customer's control. 以下 DNS 条目应该是可解析的:The following DNS entries should be resolvable:

  • AKS API 服务器,格式为 <cluster>.hcp.<region>.azmk8s.ioAKS API server in the form of <cluster>.hcp.<region>.azmk8s.io
  • Microsoft Container Registry (MCR):mcr.microsoft.comMicrosoft Container Registry (MCR): mcr.microsoft.com
  • 客户的 Azure 容器注册表 (ARC),格式为 <ACR name>.azurecr.cnCustomer's Azure Container Registry (ARC) in the form of <ACR name>.azurecr.cn
  • Azure 存储帐户,格式为 <account>.table.core.windows.net 和 <account>.blob.core.chinacloudapi.cnAzure Storage Account in the form of <account>.table.core.windows.net and <account>.blob.core.chinacloudapi.cn
  • (可选)对于 AAD 身份验证:api.azureml.ms(Optional) For AAD authentication: api.azureml.ms
  • 评分终结点域名,由 Azure ML 自动生成或自定义域名。Scoring endpoint domain name, either auto-generated by Azure ML or custom domain name. 自动生成的域名如下所示:<leaf-domain-label + auto-generated suffix>.<region>.cloudapp.chinacloudapi.cnThe auto-generated domain name would look like: <leaf-domain-label + auto-generated suffix>.<region>.cloudapp.chinacloudapi.cn

按时间顺序排列的连接性要求:从群集创建到模型部署Connectivity requirements in chronological order: from cluster creation to model deployment

在 AKS 创建或附加过程中,将 Azure ML 路由器 (azureml-fe) 部署到 AKS 群集中。In the process of AKS create or attach, Azure ML router (azureml-fe) is deployed into the AKS cluster. 为了部署 Azure ML 路由器, AKS 节点应该能够:In order to deploy Azure ML router, AKS node should be able to:

  • 解析 AKS API 服务器的 DNSResolve DNS for AKS API server
  • 解析 MCR 的 DNS,以便为 Azure ML 路由器下载 docker 映像Resolve DNS for MCR in order to download docker images for Azure ML router
  • 从需要出站连接的 MCR 下载映像Download images from MCR, where outbound connectivity is required

在部署 azureml-fe 之后,它将立即尝试启动,这需要:Right after azureml-fe is deployed, it will attempt to start and this requires to:

  • 解析 AKS API 服务器的 DNSResolve DNS for AKS API server
  • 查询 AKS API 服务器以发现自身的其他实例(这是一个多 Pod 服务)Query AKS API server to discover other instances of itself (it is a multi-pod service)
  • 连接到其自身的其他实例Connect to other instances of itself

启动 azureml 后,需要更多的连接才能正常工作:Once azureml-fe is started, it requires additional connectivity to function properly:

  • 连接到 Azure 存储以下载动态配置Connect to Azure Storage to download dynamic configuration
  • 为 AAD 身份验证服务器 api.azureml.ms 解析 DNS,并在部署的服务使用 AAD 身份验证时与其通信。Resolve DNS for AAD authentication server api.azureml.ms and communicate with it when the deployed service uses AAD authentication.
  • 查询 AKS API 服务器以发现已部署的模型Query AKS API server to discover deployed models
  • 与已部署的模型 Pod 进行通信Communicate to deployed model PODs

在模型部署时,要成功进行模型部署,AKS 节点应能够:At model deployment time, for a successful model deployment AKS node should be able to:

  • 为客户的 ACR 解析 DNSResolve DNS for customer's ACR
  • 从客户的 ACR 下载映像Download images from customer's ACR
  • 为存储模型的 Azure BLOB 解析 DNSResolve DNS for Azure BLOBs where model is stored
  • 从 Azure Blob 下载模型Download models from Azure BLOBs

部署模型并启动服务后,azureml-fe 将使用 AKS API 自动发现它,并准备将请求路由到该模型。After the model is deployed and service starts, azureml-fe will automatically discover it using AKS API and will be ready to route request to it. 它必须能够与模型 Pod 通信。It must be able to communicate to model PODs.

备注

如果部署的模型需要任何连接(例如查询外部数据库或其他 REST 服务,下载 BLOG 等),则应启用这些服务的 DNS 解析和出站通信。If the deployed model requires any connectivity (e.g. querying external database or other REST service, downloading a BLOG etc), then both DNS resolution and outbound communication for these services should be enabled.

部署到 AKSDeploy to AKS

要将模型部署到 Azure Kubernetes 服务,请创建一个描述所需计算资源的部署配置。To deploy a model to Azure Kubernetes Service, create a deployment configuration that describes the compute resources needed. 例如,核心和内存的数量。For example, number of cores and memory. 此外,还需要一个推理配置,描述托管模型和 Web 服务所需的环境。You also need an inference configuration, which describes the environment needed to host the model and web service. 有关如何创建推理配置的详细信息,请参阅部署模型的方式和位置For more information on creating the inference configuration, see How and where to deploy models.

备注

待部署模型的数量限制为每个部署(每个容器)1,000 个模型。The number of models to be deployed is limited to 1,000 models per deployment (per container).

from azureml.core.webservice import AksWebservice, Webservice
from azureml.core.model import Model

aks_target = AksCompute(ws,"myaks")
# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config, aks_target)
service.wait_for_deployment(show_output = True)
print(service.state)
print(service.get_logs())

有关此示例中使用的类、方法和参数的详细信息,请参阅以下参考文档:For more information on the classes, methods, and parameters used in this example, see the following reference documents:

自动缩放Autoscaling

处理 Azure ML 模型部署的自动缩放的组件是 azureml-fe,这是一个智能请求路由器。The component that handles autoscaling for Azure ML model deployments is azureml-fe, which is a smart request router. 由于所有推理请求都通过它进行,因此它具有自动缩放已部署模型所需的数据。Since all inference requests go through it, it has the necessary data to automatically scale the deployed model(s).

重要

  • 不要为模型部署启用 Kubernetes 水平 Pod 自动缩放程序 (HPA)Do not enable Kubernetes Horizontal Pod Autoscaler (HPA) for model deployments. 这样做会导致两个自动缩放组件相互竞争。Doing so would cause the two auto-scaling components to compete with each other. Azureml-fe 设计用于自动缩放由 Azure ML 部署的模型,其中,HPA 必须根据 CPU 使用率或自定义指标配置等一般指标推测或估算模型利用率。Azureml-fe is designed to auto-scale models deployed by Azure ML, where HPA would have to guess or approximate model utilization from a generic metric like CPU usage or a custom metric configuration.

  • Azureml-fe 不会缩放 AKS 群集中的节点数,因为这可能会导致成本意外增加。Azureml-fe does not scale the number of nodes in an AKS cluster, because this could lead to unexpected cost increases. 相反,它会在物理群集边界内 缩放模型的副本数Instead, it scales the number of replicas for the model within the physical cluster boundaries. 如果你需要缩放群集中的节点数,则可以手动缩放群集,或配置 AKS 群集自动缩放程序If you need to scale the number of nodes within the cluster, you can manually scale the cluster or configure the AKS cluster autoscaler.

可以通过为 AKS Web 服务设置 autoscale_target_utilizationautoscale_min_replicasautoscale_max_replicas 来控制自动缩放。Autoscaling can be controlled by setting autoscale_target_utilization, autoscale_min_replicas, and autoscale_max_replicas for the AKS web service. 以下示例演示了如何启用自动缩放:The following example demonstrates how to enable autoscaling:

aks_config = AksWebservice.deploy_configuration(autoscale_enabled=True, 
                                                autoscale_target_utilization=30,
                                                autoscale_min_replicas=1,
                                                autoscale_max_replicas=4)

纵向扩展/缩减决策取决于当前容器副本的利用率。Decisions to scale up/down is based off of utilization of the current container replicas. 处于繁忙状态(正在处理请求)的副本数除以当前副本的总数为当前利用率。The number of replicas that are busy (processing a request) divided by the total number of current replicas is the current utilization. 如果此数字超出 autoscale_target_utilization,则会创建更多的副本。If this number exceeds autoscale_target_utilization, then more replicas are created. 如果它较低,则会减少副本。If it is lower, then replicas are reduced. 默认情况下,目标利用率为 70%。By default, the target utilization is 70%.

添加副本的决策是迫切而迅速的(大约 1 秒)。Decisions to add replicas are eager and fast (around 1 second). 删除副本的决策是保守的(大约 1 分钟)。Decisions to remove replicas are conservative (around 1 minute).

可以使用以下代码计算所需的副本:You can calculate the required replicas by using the following code:

from math import ceil
# target requests per second
targetRps = 20
# time to process the request (in seconds)
reqTime = 10
# Maximum requests per container
maxReqPerContainer = 1
# target_utilization. 70% in this example
targetUtilization = .7

concurrentRequests = targetRps * reqTime / targetUtilization

# Number of container replicas
replicas = ceil(concurrentRequests / maxReqPerContainer)

有关设置 autoscale_target_utilizationautoscale_max_replicasautoscale_min_replicas 的详细信息,请参阅 AksWebservice 模块参考。For more information on setting autoscale_target_utilization, autoscale_max_replicas, and autoscale_min_replicas, see the AksWebservice module reference.

使用受控推出(预览版)将模型部署到 AKSDeploy models to AKS using controlled rollout (preview)

使用终结点以受控的方式分析和提升模型版本。Analyze and promote model versions in a controlled fashion using endpoints. 最多可以在一个终结点后方部署六个版本。You can deploy up to six versions behind a single endpoint. 终结点提供以下功能:Endpoints provide the following capabilities:

  • 配置 发送到每个终结点的评分流量百分比Configure the percentage of scoring traffic sent to each endpoint. 例如,将 20% 的流量路由到终结点“test”,将 80% 路由到“production”。For example, route 20% of the traffic to endpoint 'test' and 80% to 'production'.

    备注

    如果不按 100% 的流量计算,则所有剩余百分比的流量将路由到默认终结点版本。If you do not account for 100% of the traffic, any remaining percentage is routed to the default endpoint version. 例如,如果将终结点版本“test”配置为获取 10% 的流量,将“prod”配置为 30%,则剩余的 60% 将发送到默认终结点版本。For example, if you configure endpoint version 'test' to get 10% of the traffic, and 'prod' for 30%, the remaining 60% is sent to the default endpoint version.

    创建的第一个终结点版本将自动配置为默认版本。The first endpoint version created is automatically configured as the default. 可通过在创建或更新终结点版本时设置 is_default=True 来更改此设置。You can change this by setting is_default=True when creating or updating an endpoint version.

  • 将终结点版本标记为“对照”或“实验” 。Tag an endpoint version as either control or treatment. 例如,当前的生产终结点版本可能为“对照”版本,而可能的新模型将部署为“实验”版本。For example, the current production endpoint version might be the control, while potential new models are deployed as treatment versions. 评估“实验”版本的性能后,如果该版本优于当前的“对照”版本,则其可能会提升为新的生产/对照版本。After evaluating performance of the treatment versions, if one outperforms the current control, it might be promoted to the new production/control.

    备注

    只能有一个“对照”版本。You can only have one control. 可以有多个“实验”版本。You can have multiple treatments.

可以启用 App Insights 来查看终结点和已部署版本的操作指标。You can enable app insights to view operational metrics of endpoints and deployed versions.

创建终结点Create an endpoint

做好部署模型的准备后,请创建一个评分终结点,并部署第一个版本。Once you are ready to deploy your models, create a scoring endpoint and deploy your first version. 以下示例演示如何使用 SDK 部署和创建终结点。The following example shows how to deploy and create the endpoint using the SDK. 将第一个部署定义为默认版本,这意味着所有版本中未指定的百分比的流量都将流向默认版本。The first deployment will be defined as the default version, which means that unspecified traffic percentile across all versions will go to the default version.

提示

在下面的示例中,所作配置将初始终结点版本设置为处理 20% 的流量。In the following example, the configuration sets the initial endpoint version to handle 20% of the traffic. 由于这是第一个终结点,因此它也是默认版本。Since this is the first endpoint, it's also the default version. 而且,由于我们没有用于处理其余 80% 流量的其他版本,因此这些流量也将其路由到默认版本。And since we don't have any other versions for the other 80% of traffic, it is routed to the default as well. 在部署了可处理一定百分比流量的其他版本以前,此版本实际将接收 100% 的流量。Until other versions that take a percentage of traffic are deployed, this one effectively receives 100% of the traffic.

import azureml.core,
from azureml.core.webservice import AksEndpoint
from azureml.core.compute import AksCompute
from azureml.core.compute import ComputeTarget
# select a created compute
compute = ComputeTarget(ws, 'myaks')

# define the endpoint and version name
endpoint_name = "mynewendpoint"
version_name= "versiona"
# create the deployment config and define the scoring traffic percentile for the first deployment
endpoint_deployment_config = AksEndpoint.deploy_configuration(cpu_cores = 0.1, memory_gb = 0.2,
                                                              enable_app_insights = True,
                                                              tags = {'sckitlearn':'demo'},
                                                              description = "testing versions",
                                                              version_name = version_name,
                                                              traffic_percentile = 20)
 # deploy the model and endpoint
 endpoint = Model.deploy(ws, endpoint_name, [model], inference_config, endpoint_deployment_config, compute)
 # Wait for he process to complete
 endpoint.wait_for_deployment(True)

更新版本并将其添加到终结点Update and add versions to an endpoint

将其他版本添加到终结点,并配置流向该版本的评分流量的百分比。Add another version to your endpoint and configure the scoring traffic percentile going to the version. 有两种类型的版本:控制版本和处理版本。There are two types of versions, a control and a treatment version. 可设置多个“实验”版本来帮助进行与单个“对照”版本之间的比较。There can be multiple treatment versions to help compare against a single control version.

提示

由以下代码段创建的第二个版本可接受 10% 的流量。The second version, created by the following code snippet, accepts 10% of traffic. 第一个版本配置为 20%,因此总共仅为特定版本配置了 30% 的流量。The first version is configured for 20%, so only 30% of the traffic is configured for specific versions. 剩余的 70% 将发送到第一个终结点版本,因为它也是默认版本。The remaining 70% is sent to the first endpoint version, because it is also the default version.

from azureml.core.webservice import AksEndpoint

# add another model deployment to the same endpoint as above
version_name_add = "versionb"
endpoint.create_version(version_name = version_name_add,
                       inference_config=inference_config,
                       models=[model],
                       tags = {'modelVersion':'b'},
                       description = "my second version",
                       traffic_percentile = 10)
endpoint.wait_for_deployment(True)

更新或删除终结点中的现有版本。Update existing versions or delete them in an endpoint. 可更改版本的默认类型、控件类型和流量百分比。You can change the version's default type, control type, and the traffic percentile. 在下面的示例中,第二个版本会将其流量增加到 40% 且其现在为默认版本。In the following example, the second version increases its traffic to 40% and is now the default.

提示

运行以下代码段之后,现在第二个版本变为默认版本。After the following code snippet, the second version is now default. 它现在配置为 40%,而原始版本仍配置为 20%。It is now configured for 40%, while the original version is still configured for 20%. 这意味着,还有 40% 的流量未计入版本配置。This means that 40% of traffic is not accounted for by version configurations. 剩余的流量将路由到第二个版本,因为它现在为默认版本。The leftover traffic will be routed to the second version, because it is now default. 它实际上接收了 80% 的流量。It effectively receives 80% of the traffic.

from azureml.core.webservice import AksEndpoint

# update the version's scoring traffic percentage and if it is a default or control type
endpoint.update_version(version_name=endpoint.versions["versionb"].name,
                       description="my second version update",
                       traffic_percentile=40,
                       is_default=True,
                       is_control_version_type=True)
# Wait for the process to complete before deleting
endpoint.wait_for_deployment(true)
# delete a version in an endpoint
endpoint.delete_version(version_name="versionb")

Web 服务身份验证Web service authentication

部署到 Azure Kubernetes 服务时,默认会启用基于密钥的身份验证。When deploying to Azure Kubernetes Service, key-based authentication is enabled by default. 此外,还可以启用基于令牌的身份验证。You can also enable token-based authentication. 基于令牌的身份验证要求客户端使用 Azure Active Directory 帐户来请求身份验证令牌,该令牌用于向已部署的服务发出请求。Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service.

要禁用身份验证,请在创建部署配置时设置 auth_enabled=False 参数。To disable authentication, set the auth_enabled=False parameter when creating the deployment configuration. 下面的示例使用 SDK 来禁用身份验证:The following example disables authentication using the SDK:

deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, auth_enabled=False)

有关如何从客户端应用程序进行身份验证的信息,请参阅使用部署为 Web 服务的 Azure 机器学习模型For information on authenticating from a client application, see the Consume an Azure Machine Learning model deployed as a web service.

使用密钥进行身份验证Authentication with keys

如果已启用密钥身份验证,可以使用 get_keys 方法来检索主要和辅助身份验证密钥:If key authentication is enabled, you can use the get_keys method to retrieve a primary and secondary authentication key:

primary, secondary = service.get_keys()
print(primary)

重要

如需重新生成密钥,请使用 service.regen_keyIf you need to regenerate a key, use service.regen_key

使用令牌进行身份验证Authentication with tokens

要启用令牌身份验证,请在创建或更新部署时设置 token_auth_enabled=True 参数。To enable token authentication, set the token_auth_enabled=True parameter when you are creating or updating a deployment. 下面的示例使用 SDK 来启用令牌身份验证:The following example enables token authentication using the SDK:

deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, token_auth_enabled=True)

如果启用了令牌身份验证,可以使用 get_token 方法来检索 JWT 令牌以及该令牌的到期时间:If token authentication is enabled, you can use the get_token method to retrieve a JWT token and that token's expiration time:

token, refresh_by = service.get_token()
print(token)

重要

需要在令牌的 refresh_by 时间后请求一个新令牌。You will need to request a new token after the token's refresh_by time.

Microsoft 强烈建议在 Azure Kubernetes 服务群集所在的相同区域中创建 Azure 机器学习工作区。Microsoft strongly recommends that you create your Azure Machine Learning workspace in the same region as your Azure Kubernetes Service cluster. 要使用令牌进行身份验证,Web 服务将调用创建 Azure 机器学习工作区的区域。To authenticate with a token, the web service will make a call to the region in which your Azure Machine Learning workspace is created. 如果工作区区域不可用,即使群集和工作区不在同一区域,也将无法获取 Web 服务的令牌。If your workspace's region is unavailable, then you will not be able to fetch a token for your web service even, if your cluster is in a different region than your workspace. 这实际上会导致在工作区的区域再次可用之前,基于令牌的身份验证不可用。This effectively results in Token-based Authentication being unavailable until your workspace's region is available again. 此外,群集区域和工作区区域的距离越远,获取令牌所需的时间就越长。In addition, the greater the distance between your cluster's region and your workspace's region, the longer it will take to fetch a token.

若要检索令牌,必须使用 Azure 机器学习 SDK 或 az ml service get-access-token 命令。To retrieve a token, you must use the Azure Machine Learning SDK or the az ml service get-access-token command.

漏洞扫描Vulnerability scanning

Azure 安全中心跨混合云工作负荷提供统一的安全管理和高级威胁防护。Azure Security Center provides unified security management and advanced threat protection across hybrid cloud workloads. 你应该允许 Azure 安全中心扫描你的资源并遵循其建议。You should allow Azure Security Center to scan your resources and follow its recommendations. 有关详细信息,请参阅 Azure Kubernetes 服务与安全中心的集成For more, see Azure Kubernetes Services integration with Security Center.

后续步骤Next steps