将托管标识与 Azure 机器学习结合使用(预览版)Use Managed identities with Azure Machine Learning (preview)

通过托管标识,可以使用访问资源所需的最低权限配置工作区。Managed identities allow you to configure your workspace with the minimum required permissions to access resources.

以可信方式配置 Azure 机器学习工作区时,务必确保与工作区关联的不同服务具有正确的访问级别。When configuring Azure Machine Learning workspace in trustworthy manner, it is important to ensure that different services associated with the workspace have the correct level of access. 例如,在机器学习工作流期间,工作区需要访问用于 Docker 映像的 Azure 容器注册表 (ACR) 和用于训练数据的存储帐户。For example, during machine learning workflow the workspace needs access to Azure Container Registry (ACR) for Docker images, and storage accounts for training data.

此外,托管标识支持对权限进行精细控制,例如,可以授予或撤消从特定的计算资源对特定 ACR 的访问权限。Furthermore, managed identities allow fine-grained control over permissions, for example you can grant or revoke access from specific compute resources to a specific ACR.

本文介绍如何使用托管标识执行以下操作:In this article, you'll learn how to use managed identities to:

  • 为 Azure 机器学习工作区配置和使用 ACR,无需让管理员用户访问 ACR。Configure and use ACR for your Azure Machine Learning workspace without having to enable admin user access to ACR.
  • 访问工作区外部的专有 ACR,以拉取用于训练或推理的基础映像。Access a private ACR external to your workspace, to pull base images for training or inference.

重要

通过 Azure 机器学习使用托管标识控制对资源的访问这一功能当前处于预览阶段。Using managed identities to control access to resources with Azure Machine Learning is currently in preview. 预览功能按原样提供,不保证支持或服务级别协议。Preview functionality is provided "as-is", with no guarantee of support or service level agreement. 有关详细信息,请参阅 Microsoft Azure 预览版补充使用条款For more information, see the Supplemental terms of use for Microsoft Azure previews.

先决条件Prerequisites

配置托管标识Configure managed identities

在某些情况下,必须禁止管理员用户对 Azure 容器注册表的访问。In some situations, it's necessary to disallow admin user access to Azure Container Registry. 例如,可能共享了 ACR,你需要禁止其他用户进行管理访问。For example, the ACR may be shared and you need to disallow admin access by other users. 或者,订阅级别策略不允许创建启用了管理员用户的 ACR。Or, creating ACR with admin user enabled is disallowed by a subscription level policy.

重要

使用 Azure 机器学习在 Azure 容器实例 (ACI) 上进行推理时,需要 ACR 上的管理员用户访问权限。When using Azure Machine Learning for inference on Azure Container Instance (ACI), admin user access on ACR is required. 如果计划将模型部署到 ACI 进行推理,请不要禁用此功能。Do not disable it if you plan on deploying models to ACI for inference.

在不启用管理员用户访问权限的情况下创建 ACR 时,将使用托管标识来访问 ACR 以生成和拉取 Docker 映像。When you create ACR without enabling admin user access, managed identities are used to access the ACR to build and pull Docker images.

创建工作区时,可以在禁用管理员用户的情况下自带 ACR。You can bring your own ACR with admin user disabled when you create the workspace. 或者,让 Azure 机器学习创建工作区 ACR,以后再禁用管理员用户。Alternatively, let Azure Machine Learning create workspace ACR and disable admin user afterwards.

自带 ACRBring your own ACR

如果订阅策略不允许使用 ACR 管理员用户,则应首先创建无管理员用户的 ACR,然后将其与工作区关联。If ACR admin user is disallowed by subscription policy, you should first create ACR without admin user, and then associate it with the workspace. 此外,如果有已禁用管理员用户的现有 ACR,可以将其附加到工作区。Also, if you have existing ACR with admin user disabled, you can attach it to the workspace.

从 Azure CLI 创建 ACR,无需设置 --admin-enabled 参数,或从 Azure 门户创建,无需启用管理员用户。Create ACR from Azure CLI without setting --admin-enabled argument, or from Azure portal without enabling admin user. 然后,在创建 Azure 机器学习工作区时,指定 ACR 的 Azure 资源 ID。Then, when creating Azure Machine Learning workspace, specify the Azure resource ID of the ACR. 下面的示例演示如何创建使用现有 ACR 的新 Azure ML 工作区:The following example demonstrates creating a new Azure ML workspace that uses an existing ACR:

提示

若要获取 --container-registry 参数的值,请使用 az acr show 命令显示 ACR 的信息。To get the value for the --container-registry parameter, use the az acr show command to show information for your ACR. id 字段包含 ACR 的资源 ID。The id field contains the resource ID for your ACR.

az ml workspace create -w <workspace name> \
-g <workspace resource group> \
-l <region> \
--container-registry /subscriptions/<subscription id>/resourceGroups/<acr resource group>/providers/Microsoft.ContainerRegistry/registries/<acr name>

让 Azure 机器学习服务创建工作区 ACRLet Azure Machine Learning service create workspace ACR

如果没有自带 ACR,Azure 机器学习服务将在执行需要 ACR 的操作时创建一个 ACR。If you do not bring your own ACR, Azure Machine Learning service will create one for you when you perform an operation that needs one. 例如,将训练运行提交到机器学习计算、生成环境或部署 web 服务终结点。For example, submit a training run to Machine Learning Compute, build an environment, or deploy a web service endpoint. 工作区创建的 ACR 将启用管理员用户,你需要手动禁用管理员用户。The ACR created by the workspace will have admin user enabled, and you need to disable the admin user manually.

  1. 创建新的工作区Create a new workspace

    az ml workspace show -n <my workspace> -g <my resource group>
    
  2. 执行需要 ACR 的操作。Perform an action that requires ACR. 例如,训练模型教程For example, the tutorial on training a model.

  3. 获取由群集创建的 ACR 名称:Get the ACR name created by the cluster:

    az ml workspace show -w <my workspace> \
    -g <my resource group>
    --query containerRegistry
    

    此命令返回类似于以下文本的值。This command returns a value similar to the following text. 只需要文本的最后一部分,即 ACR 实例名称:You only want the last portion of the text, which is the ACR instance name:

    /subscriptions/<subscription id>/resourceGroups/<my resource group>/providers/MicrosoftContainerReggistry/registries/<ACR instance name>
    
  4. 更新 ACR 以禁用管理员用户:Update the ACR to disable the admin user:

    az acr update --name <ACR instance name> --admin-enabled false
    

使用托管标识创建计算以访问用于训练的 Docker 映像Create compute with managed identity to access Docker images for training

若要访问工作区 ACR,请创建启用了系统分配的托管标识的机器学习计算群集。To access the workspace ACR, create machine learning compute cluster with system-assigned managed identity enabled. 创建计算时可以从 Azure 门户或工作室启用标识,也可以使用以下方式从 Azure CLI 启用You can enable the identity from Azure portal or Studio when creating compute, or from Azure CLI using

使用 AmlComputeProvisioningConfiguration 创建计算群集时,请使用 identity_type 参数设置托管标识类型。When creating a compute cluster with the AmlComputeProvisioningConfiguration, use the identity_type parameter to set the managed identity type.

托管标识在工作区 ACR 上自动被授予 ACRPull 角色,以允许拉取 Docker 映像进行训练。A managed identity is automatically granted ACRPull role on workspace ACR to enable pulling Docker images for training.

备注

如果首先创建计算,则必须手动分配 ACRPull 角色,才能创建工作区 ACR。If you create compute first, before workspace ACR has been created, you have to assign the ACRPull role manually.

从专用 ACR 访问基础映像Access base images from private ACR

默认情况下,Azure 机器学习使用来自 Microsoft 托管的公共存储库中的 Docker 基础映像。By default, Azure Machine Learning uses Docker base images that come from a public repository managed by Microsoft. 然后,在这些映像上生成训练或推理环境。It then builds your training or inference environment on those images. 有关详细信息,请参阅什么是 ML 环境?For more information, see What are ML environments?.

若要在企业内部使用自定义基础映像,可以使用托管标识访问专用 ACR。To use a custom base image internal to your enterprise, you can use managed identities to access your private ACR. 下面是两个用例:There are two use cases:

  • 使用基础映像按原样进行训练。Use base image for training as is.
  • 使用自定义映像作为基础生成 Azure 机器学习托管映像。Build Azure Machine Learning managed image with custom image as a base.

将 Docker 基础映像拉取到机器学习计算群集按原样进行训练Pull Docker base image to machine learning compute cluster for training as is

如前面所述,创建启用了系统分配的托管标识的机器学习计算群集。Create machine learning compute cluster with system-assigned managed identity enabled as described earlier. 然后,确定托管标识的主体 ID。Then, determine the principal ID of the managed identity.

az ml computetarget amlcompute identity show --name <cluster name> -w <workspace> -g <resource group>

或者,可以更新计算群集来分配用户分配的托管标识:Optionally, you can update the compute cluster to assign a user-assigned managed identity:

az ml computetarget amlcompute identity assign --name cpucluster \
-w $mlws -g $mlrg --identities <my-identity-id>

若要允许计算群集拉取基础映像,请在专用 ACR 上授予托管服务标识 ACRPull 角色To allow the compute cluster to pull the base images, grant the managed service identity ACRPull role on the private ACR

az role assignment create --assignee <principal ID> \
--role acrpull \
--scope "/subscriptions/<subscription ID>/resourceGroups/<private ACR resource group>/providers/Microsoft.ContainerRegistry/registries/<private ACR name>"

最后,在提交训练运行时,请在环境定义中指定基础映像位置。Finally, when submitting a training run, specify the base image location in the environment definition.

from azureml.core import Environment
env = Environment(name="private-acr")
env.docker.base_image = "<ACR name>.azurecr.io/<base image repository>/<base image version>"
env.python.user_managed_dependencies = True

重要

若要确保基础映像直接拉取到计算资源,请设置 user_managed_dependencies = True,并且不要指定 Dockerfile。To ensure that the base image is pulled directly to the compute resource, set user_managed_dependencies = True and do not specify a Dockerfile. 否则 Azure 机器学习服务将尝试生成新的 Docker 映像并失败,因为只有计算群集才能从 ACR 中拉取基础映像。Otherwise Azure Machine Learning service will attempt to build a new Docker image and fail, because only the compute cluster has access to pull the base image from ACR.

从专用 ACR 生成 Azure 机器学习托管环境到基础映像以进行训练或推理Build Azure Machine Learning managed environment into base image from private ACR for training or inference

在此场景中,Azure 机器学习服务在从专用 ACR 提供的基础映像之上生成训练或推理环境。In this scenario, Azure Machine Learning service builds the training or inference environment on top of a base image you supply from a private ACR. 由于映像生成任务在工作区 ACR 上使用 ACR 任务进行,因此必须执行其他步骤以允许访问。Because the image build task happens on the workspace ACR using ACR Tasks, you must perform additional steps to allow access.

  1. 创建用户分配的托管标识并向该标识授予对专用 ACR 的 ACRPull 访问权限 。Create user-assigned managed identity and grant the identity ACRPull access to the private ACR.

  2. 向工作区系统分配的托管标识授予上一步中用户分配的托管标识上的托管标识操作员角色 。Grant the workspace system-assigned managed identity a Managed Identity Operator role on the user-assigned managed identity from the previous step. 此角色允许工作区将用户分配的托管标识分配给 ACR 任务用于生成托管环境。This role allows the workspace to assign the user-assigned managed identity to ACR Task for building the managed environment.

    1. 获取工作区系统分配的托管标识的主体 ID:Obtain the principal ID of workspace system-assigned managed identity:

      az ml workspace show -w <workspace name> -g <resource group> --query identityPrincipalId
      
    2. 授予托管标识操作员角色:Grant the Managed Identity Operator role:

      az role assignment create --assignee <principal ID> --role managedidentityoperator --scope <UAI resource ID>
      

      UAI 资源 ID 是用户分配的标识的 Azure 资源 ID,格式为 /subscriptions/<subscription ID>/resourceGroups/<resource group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<UAI name>The UAI resource ID is Azure resource ID of the user assigned identity, in the format /subscriptions/<subscription ID>/resourceGroups/<resource group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<UAI name>.

  3. 使用 方法在工作区连接中指定用户分配的托管标识的外部 ACR 和客户端 ID:Specify the external ACR and client ID of the user-assigned managed identity in workspace connections by using Workspace.set_connection method:

    workspace.set_connection(
        name="privateAcr", 
        category="ACR", 
        target = "<acr url>", 
        authType = "RegistryConnection", 
        value={"ResourceId": "<UAI resource id>", "ClientId": "<UAI client ID>"})
    

完成配置后,可以在生成用于训练或推理的环境时使用专用 ACR 中的基础映像。Once the configuration is complete, you can use the base images from private ACR when building environments for training or inference. 下面的代码片段演示如何在环境定义中指定基础映像 ACR 和映像名称:The following code snippet demonstrates how to specify the base image ACR and image name in an environment definition:

from azureml.core import Environment

env = Environment(name="my-env")
env.docker.base_image = "<acr url>/my-repo/my-image:latest"

或者,可以使用 RegistryIdentity 在环境定义本身中指定托管标识资源 URL 和客户端 ID。Optionally, you can specify the managed identity resource URL and client ID in the environment definition itself by using RegistryIdentity. 如果显式使用注册表标识,则它会替代前面指定的任何工作区连接:If you use registry identity explicitly, it overrides any workspace connections specified earlier:

from azureml.core.container_registry import RegistryIdentity

identity = RegistryIdentity()
identity.resource_id= "<UAI resource ID>"
identity.client_id="<UAI client ID>”
env.docker.base_image_registry.registry_identity=identity
env.docker.base_image = "my-acr.azurecr.io/my-repo/my-image:latest"

使用 Docker 映像进行推理Use Docker images for inference

如前面所述,在没有管理员用户的情况下配置 ACR 后,可以访问 Docker 映像进行推理,无需 Azure Kubernetes service (AKS) 中的管理密钥。Once you've configured ACR without admin user as described earlier, you can access Docker images for inference without admin keys from your Azure Kubernetes service (AKS). 创建 AKS 或将其附加到工作区时,会自动为该群集的服务主体分配对工作区 ACR 的 ACRPull 访问权限。When you create or attach AKS to workspace, the cluster's service principal is automatically assigned ACRPull access to workspace ACR.

备注

如果自带 AKS 群集,则群集必须已启用服务主体而不是托管标识。If you bring your own AKS cluster, the cluster must have service principal enabled instead of managed identity.

后续步骤Next steps