使用虚拟网络保护 Azure 机器学习推理环境Secure an Azure Machine Learning inferencing environment with virtual networks

本文介绍如何在 Azure 机器学习中使用虚拟网络保护推理环境。In this article, you learn how to secure inferencing environments with a virtual network in Azure Machine Learning.

本文是由四部分组成的系列文章的第四部分,指导你如何保护 Azure 机器学习工作流。This article is part four of a four-part series that walks you through securing an Azure Machine Learning workflow.

请参阅本系列中的其他文章:See the other articles in this series:

1.保护工作区 > 2.保护训练环境 > 3.保护推理环境 > 4.启用工作室功能1.Secure the workspace > 2. Secure the training environment > 3. Secure the inferencing environment > 4. Enable studio functionality

本文介绍如何在虚拟网络中保护以下推理资源:In this article you learn how to secure the following inferencing resources in a virtual network:

  • 默认 Azure Kubernetes 服务 (AKS) 群集Default Azure Kubernetes Service (AKS) cluster
  • 专用 AKS 群集Private AKS cluster
  • Azure 容器实例 (ACI)Azure Container Instances (ACI)

先决条件Prerequisites

  • 用于计算资源的现有虚拟网络和子网。An existing virtual network and subnet to use with your compute resources.

  • 若要将资源部署到虚拟网络或子网中,你的用户帐户必须在 Azure 基于角色的访问控制 (RBAC) 中具有以下操作的权限:To deploy resources into a virtual network or subnet, your user account must have permissions to the following actions in Azure role-based access controls (RBAC):

    • “Microsoft.Network/virtualNetworks/join/action”(在虚拟网络资源上)。"Microsoft.Network/virtualNetworks/join/action" on the virtual network resource.
    • “Microsoft.Network/virtualNetworks/subnet/join/action”(在子网资源上)。"Microsoft.Network/virtualNetworks/subnet/join/action" on the subnet resource.

    若要详细了解如何将 RBAC 与网络配合使用,请参阅网络内置角色For more information on RBAC with networking, see the Networking built-in roles

Azure Kubernetes 服务Azure Kubernetes Service

若要在虚拟网络中使用 AKS 群集,必须满足以下网络要求:To use an AKS cluster in a virtual network, the following network requirements must be met:

若要将虚拟网络中的 AKS 添加到工作区,请执行以下步骤:To add AKS in a virtual network to your workspace, use the following steps:

  1. 登录 Azure 机器学习工作室,然后选择你的订阅和工作区。Sign in to Azure Machine Learning studio, and then select your subscription and workspace.

  2. 选择左侧的“计算”。Select Compute on the left.

  3. 在中心内选择“推理群集”,然后选择“+”。Select Inference clusters from the center, and then select +.

  4. 在“新建推理群集”对话框中,选择“网络配置”下的“高级”。In the New Inference Cluster dialog, select Advanced under Network configuration.

  5. 若要将此计算资源配置为使用虚拟网络,请执行以下操作:To configure this compute resource to use a virtual network, perform the following actions:

    1. 在“资源组”下拉列表中,选择包含虚拟网络的资源组。In the Resource group drop-down list, select the resource group that contains the virtual network.
    2. 在“虚拟网络”下拉列表中,选择包含子网的虚拟网络。In the Virtual network drop-down list, select the virtual network that contains the subnet.
    3. 在“子网”下拉列表中选择子网。In the Subnet drop-down list, select the subnet.
    4. 在“Kubernetes 服务地址范围”中,输入 Kubernetes 服务地址范围。In the Kubernetes Service address range box, enter the Kubernetes service address range. 此地址范围使用无类域间路由 (CIDR) 表示法表示的 IP 范围来定义群集可用的 IP 地址。This address range uses a Classless Inter-Domain Routing (CIDR) notation IP range to define the IP addresses that are available for the cluster. 此范围不得与任何子网 IP 范围重叠(例如 10.0.0.0/16)。It must not overlap with any subnet IP ranges (for example, 10.0.0.0/16).
    5. 在“Kubernetes DNS 服务 IP 地址”框中,输入 Kubernetes DNS 服务 IP 地址。In the Kubernetes DNS service IP address box, enter the Kubernetes DNS service IP address. 此 IP 地址将分配给 Kubernetes DNS 服务。This IP address is assigned to the Kubernetes DNS service. 此 IP 地址必须在 Kubernetes 服务地址范围内(例如 10.0.0.10)。It must be within the Kubernetes service address range (for example, 10.0.0.10).
    6. 在“Docker 网桥地址”框中,输入 Docker 网桥地址。In the Docker bridge address box, enter the Docker bridge address. 此 IP 地址将分配给 Docker 网桥。This IP address is assigned to Docker Bridge. 此 IP 地址不得在任何子网 IP 范围或 Kubernetes 服务地址范围内(例如 172.17.0.1/16)。It must not be in any subnet IP ranges, or the Kubernetes service address range (for example, 172.17.0.1/16).

    Azure 机器学习:机器学习计算虚拟网络设置

  6. 将模型作为 Web 服务部署到 AKS 时,将创建一个评分终结点来处理推理请求。When you deploy a model as a web service to AKS, a scoring endpoint is created to handle inferencing requests. 若要从虚拟网络外部调用评分终结点,确保用于控制虚拟网络的 NSG 组包含一条已为该终结点的 IP 地址启用的入站安全规则。Make sure that the NSG group that controls the virtual network has an inbound security rule enabled for the IP address of the scoring endpoint if you want to call it from outside the virtual network.

    若要查找评分终结点的 IP 地址,请查看已部署服务的评分 URI。To find the IP address of the scoring endpoint, look at the scoring URI for the deployed service. 有关查看评分 URI 的详细信息,请参阅使用部署为 Web 服务的模型For information on viewing the scoring URI, see Consume a model deployed as a web service.

    重要

    保留 NSG 的默认出站规则。Keep the default outbound rules for the NSG. 有关详细信息,请参阅安全组中的“默认安全规则”。For more information, see the default security rules in Security groups.

    入站安全规则An inbound security rule

    重要

    图像中显示的评分终结点的 IP 地址将因你的部署而异。The IP address shown in the image for the scoring endpoint will be different for your deployments. 尽管一个 AKS 群集的所有部署都将共享同一 IP,但每个 AKS 群集都有不同的 IP 地址。While the same IP is shared by all deployments to one AKS cluster, each AKS cluster will have a different IP address.

也可以使用 Azure 机器学习 SDK 在虚拟网络中添加 Azure Kubernetes 服务。You can also use the Azure Machine Learning SDK to add Azure Kubernetes Service in a virtual network. 如果虚拟网络中已有一个 AKS 群集,请根据如何部署到 AKS 中所述,将此群集附加到工作区。If you already have an AKS cluster in a virtual network, attach it to the workspace as described in How to deploy to AKS. 以下代码在名为 mynetwork 的虚拟网络的 default 子网中创建新的 AKS 实例:The following code creates a new AKS instance in the default subnet of a virtual network named mynetwork:

from azureml.core.compute import ComputeTarget, AksCompute

# Create the compute configuration and set virtual network information
config = AksCompute.provisioning_configuration(location="chinaeast2")
config.vnet_resourcegroup_name = "mygroup"
config.vnet_name = "mynetwork"
config.subnet_name = "default"
config.service_cidr = "10.0.0.0/16"
config.dns_service_ip = "10.0.0.10"
config.docker_bridge_cidr = "172.17.0.1/16"

# Create the compute target
aks_target = ComputeTarget.create(workspace=ws,
                                  name="myaks",
                                  provisioning_configuration=config)

创建过程完成后,可在虚拟网络后面的 AKS 群集上运行推理或模型评分。When the creation process is completed, you can run inference, or model scoring, on an AKS cluster behind a virtual network. 有关详细信息,请参阅如何部署 AKSFor more information, see How to deploy to AKS.

网络参与者角色Network contributor role

重要

如果通过提供之前创建的虚拟网络来创建或附加 AKS 群集,则必须向 AKS 群集的服务主体 (SP) 或托管标识授予对包含虚拟网络的资源组的 网络参与者 角色。If you create or attach an AKS cluster by providing a virtual network you previously created, you must grant the service principal (SP) or managed identity for your AKS cluster the Network Contributor role to the resource group that contains the virtual network.

若要将标识添加为网络参与者,请执行以下步骤:To add the identity as network contributor, use the following steps:

  1. 若要查找 AKS 的服务主体或托管标识 ID,请使用以下 Azure CLI 命令。To find the service principal or managed identity ID for AKS, use the following Azure CLI commands. <aks-cluster-name> 替换为群集的名称。Replace <aks-cluster-name> with the name of the cluster. <resource-group-name> 替换为包含 AKS 群集的资源组的名称:Replace <resource-group-name> with the name of the resource group that contains the AKS cluster:

    az aks show -n <aks-cluster-name> --resource-group <resource-group-name> --query servicePrincipalProfile.clientId
    

    如果此命令返回的值为 msi,请使用以下命令来识别托管标识的主体 ID:If this command returns a value of msi, use the following command to identify the principal ID for the managed identity:

    az aks show -n <aks-cluster-name> --resource-group <resource-group-name> --query identity.principalId
    
  2. 若要查找包含虚拟网络的资源组的 ID,请使用以下命令。To find the ID of the resource group that contains your virtual network, use the following command. <resource-group-name> 替换为包含虚拟网络的资源组的名称:Replace <resource-group-name> with the name of the resource group that contains the virtual network:

    az group show -n <resource-group-name> --query id
    
  3. 若要将服务主体或托管标识添加为网络参与者,请使用以下命令。To add the service principal or managed identity as a network contributor, use the following command. 使用为服务主体或托管标识返回的 ID 替换 <SP-or-managed-identity>Replace <SP-or-managed-identity> with the ID returned for the service principal or managed identity. 使用为包含虚拟网络的资源组返回的 ID 替换 <resource-group-id>Replace <resource-group-id> with the ID returned for the resource group that contains the virtual network:

    az role assignment create --assignee <SP-or-managed-identity> --role 'Network Contributor' --scope <resource-group-id>
    

若要详细了解如何结合使用内部负载均衡器与 AKS,请参阅结合使用内部负载均衡器与 Azure Kubernetes 服务For more information on using the internal load balancer with AKS, see Use internal load balancer with Azure Kubernetes Service.

保护 VNet 流量Secure VNet traffic

有两种方法可以将往返于 AKS 群集的流量隔离到虚拟网络:There are two approaches to isolate traffic to and from the AKS cluster to the virtual network:

  • 专用 AKS 群集:此方法使用 Azure 专用链接来保护与群集的通信,以便进行部署/管理操作。Private AKS cluster: This approach uses Azure Private Link to secure communications with the cluster for deployment/management operations.
  • 内部 AKS 负载均衡器:此方法将终结点(用于将项目部署到 AKS)配置为在虚拟网络中使用专用 IP。Internal AKS load balancer: This approach configures the endpoint for your deployments to AKS to use a private IP within the virtual network.

警告

内部负载均衡器不适用于使用 kubenet 的 AKS 群集。Internal load balancer does not work with an AKS cluster that uses kubenet. 若要同时使用内部负载均衡器和专用 AKS 群集,请使用 Azure 容器网络接口 (CNI) 配置专用 AKS 群集。If you want to use an internal load balancer and a private AKS cluster at the same time, configure your private AKS cluster with Azure Container Networking Interface (CNI). 有关详细信息,请参阅在 Azure Kubernetes 服务中配置 Azure CNI 网络For more information, see Configure Azure CNI networking in Azure Kubernetes Service.

专用 AKS 群集Private AKS cluster

默认情况下,AKS 群集具有一个带有公共 IP 地址的控制平面(或 API 服务器)。By default, AKS clusters have a control plane, or API server, with public IP addresses. 可以通过创建专用 AKS 群集,将 AKS 配置为使用专用控制平面。You can configure AKS to use a private control plane by creating a private AKS cluster. 有关详细信息,请参阅创建专用 Azure Kubernetes 服务群集For more information, see Create a private Azure Kubernetes Service cluster.

创建专用 AKS 群集之后,将群集连接到虚拟网络以便用于 Azure 机器学习。After you create the private AKS cluster, attach the cluster to the virtual network to use with Azure Machine Learning.

重要

在将启用了专用链接的 AKS 群集用于 Azure 机器学习之前,必须建立一个支持事件案例,否则无法启用此功能。Before using a private link enabled AKS cluster with Azure Machine Learning, you must open a support incident to enable this functionality. 有关详细信息,请参阅管理和增加配额For more information, see Manage and increase quotas.

内部 AKS 负载均衡器Internal AKS load balancer

默认情况下,AKS 部署使用公共负载均衡器By default, AKS deployments use a public load balancer. 在本部分中,你会了解如何将 AKS 配置为使用内部负载均衡器。In this section, you learn how to configure AKS to use an internal load balancer. 内部(或专用)负载平衡器用于仅在前端允许专用 IP 的情况。An internal (or private) load balancer is used where only private IPs are allowed as frontend. 内部负载均衡器用于对虚拟网络内部的流量进行负载均衡Internal load balancers are used to load balance traffic inside a virtual network

可以通过将 AKS 配置为使用内部负载均衡器来启用专用负载均衡器。A private load balancer is enabled by configuring AKS to use an internal load balancer.

启用专用负载均衡器Enable private load balancer

重要

在 Azure 机器学习工作室中创建 Azure Kubernetes 服务群集时,无法启用专用 IP。You cannot enable private IP when creating the Azure Kubernetes Service cluster in Azure Machine Learning studio. 使用 Python SDK 或 Azure CLI 扩展进行机器学习时,可以创建一个具有内部负载均衡器的 AKS 群集。You can create one with an internal load balancer when using the Python SDK or Azure CLI extension for machine learning.

以下示例演示如何使用 SDK 和 CLI 创建具有专用 IP/内部负载均衡器的新 AKS 群集The following examples demonstrate how to create a new AKS cluster with a private IP/internal load balancer using the SDK and CLI:

import azureml.core
from azureml.core.compute import AksCompute, ComputeTarget

# Verify that cluster does not exist already
try:
    aks_target = AksCompute(workspace=ws, name=aks_cluster_name)
    print("Found existing aks cluster")

except:
    print("Creating new aks cluster")

    # Subnet to use for AKS
    subnet_name = "default"
    # Create AKS configuration
    prov_config=AksCompute.provisioning_configuration(load_balancer_type="InternalLoadBalancer")
    # Set info for existing virtual network to create the cluster in
    prov_config.vnet_resourcegroup_name = "myvnetresourcegroup"
    prov_config.vnet_name = "myvnetname"
    prov_config.service_cidr = "10.0.0.0/16"
    prov_config.dns_service_ip = "10.0.0.10"
    prov_config.subnet_name = subnet_name
    prov_config.docker_bridge_cidr = "172.17.0.1/16"

    # Create compute target
    aks_target = ComputeTarget.create(workspace = ws, name = "myaks", provisioning_configuration = prov_config)
    # Wait for the operation to complete
    aks_target.wait_for_completion(show_output = True)

将现有群集附加到工作区时,必须等到附加操作完成后才能配置负载均衡器。When attaching an existing cluster to your workspace, you must wait until after the attach operation to configure the load balancer. 有关附加群集的信息,请参阅附加现有的 AKS 群集For information on attaching a cluster, see Attach an existing AKS cluster.

附加现有群集后,可以更新群集以使用内部负载均衡器/专用 IP:After attaching the existing cluster, you can then update the cluster to use an internal load balancer/private IP:

import azureml.core
from azureml.core.compute.aks import AksUpdateConfiguration
from azureml.core.compute import AksCompute

# ws = workspace object. Creation not shown in this snippet
aks_target = AksCompute(ws,"myaks")

# Change to the name of the subnet that contains AKS
subnet_name = "default"
# Update AKS configuration to use an internal load balancer
update_config = AksUpdateConfiguration(None, "InternalLoadBalancer", subnet_name)
aks_target.update(update_config)
# Wait for the operation to complete
aks_target.wait_for_completion(show_output = True)

启用 Azure 容器实例 (ACI)Enable Azure Container Instances (ACI)

Azure 容器实例在部署模型时动态创建。Azure Container Instances are dynamically created when deploying a model. 你必须为部署使用的子网启用子网委派,Azure 机器学习才能在虚拟网络中创建 ACI。To enable Azure Machine Learning to create ACI inside the virtual network, you must enable subnet delegation for the subnet used by the deployment.

警告

在虚拟网络中使用 Azure 容器实例时,虚拟网络必须与 Azure 机器学习工作区位于同一资源组中。When using Azure Container Instances in a virtual network, the virtual network must be in the same resource group as your Azure Machine Learning workspace.

在虚拟网络中使用 Azure 容器实例时,你的工作区的 Azure 容器注册表 (ACR) 不能也在该虚拟网络中。When using Azure Container Instances inside the virtual network, the Azure Container Registry (ACR) for your workspace cannot also be in the virtual network.

若要将虚拟网络中的 ACI 用于工作区,请按照以下步骤操作:To use ACI in a virtual network to your workspace, use the following steps:

  1. 若要在虚拟网络上启用子网委派,请参阅添加或删除子网委派一文中的信息。To enable subnet delegation on your virtual network, use the information in the Add or remove a subnet delegation article. 可以在创建虚拟网络时启用委派,也可以将它添加到现有网络。You can enable delegation when creating a virtual network, or add it to an existing network.

    重要

    启用委派时,使用 Microsoft.ContainerInstance/containerGroups 作为“将子网委派给服务”值。When enabling delegation, use Microsoft.ContainerInstance/containerGroups as the Delegate subnet to service value.

  2. 使用 AciWebservice.deploy_configuration() 部署模型(使用 vnet_namesubnet_name 参数)。Deploy the model using AciWebservice.deploy_configuration(), use the vnet_name and subnet_name parameters. 将这些参数设置为启用了委派的虚拟网络名称和子网。Set these parameters to the virtual network name and subnet where you enabled delegation.

限制来自虚拟网络的出站连接Limit outbound connectivity from the virtual network

如果你不想要使用默认的出站规则,同时想要限制虚拟网络的出站访问,则必须允许访问 Azure 容器注册表。If you don't want to use the default outbound rules and you do want to limit the outbound access of your virtual network, you must allow access to Azure Container Registry. 例如,确保网络安全组 (NSG) 包含允许访问“AzureContainerRegistry.RegionName”服务标记的规则,其中的 {RegionName} 是 Azure 区域的名称。For example, make sure that your Network Security Groups (NSG) contains a rule that allows access to the AzureContainerRegistry.RegionName service tag where `{RegionName} is the name of an Azure region.

后续步骤Next steps

本文是由三部分构成的虚拟网络系列文章中的第 3 部分。This article is part three in a three-part virtual network series. 若要了解如何保护虚拟网络,请参阅其余文章:See the rest of the articles to learn how to secure a virtual network: