在 Azure 机器学习工作室中创建计算目标以进行模型训练和部署Create compute targets for model training and deployment in Azure Machine Learning studio

本文介绍了如何在 Azure 机器学习工作室中创建和管理计算目标。In this article, learn how to create and manage compute targets in Azure Machine studio. 也可以使用以下 SDK 与扩展创建和管理计算目标:You can also create and manage compute targets with:

先决条件Prerequisites

什么是计算目标?What's a compute target?

使用 Azure 机器学习可以在不同的资源或环境(统称为计算目标)中训练模型。With Azure Machine Learning, you can train your model on a variety of resources or environments, collectively referred to as compute targets. 计算目标可以是本地计算机,也可以是云资源,例如 Azure 机器学习计算、Azure HDInsight 或远程虚拟机。A compute target can be a local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine. 还可以为模型部署创建计算目标,如“部署模型的位置和方式”中所述。You can also create compute targets for model deployment as described in "Where and how to deploy your models".

查看计算目标View compute targets

若要查看工作区的所有计算目标,请使用以下步骤:To see all compute targets for your workspace, use the following steps:

  1. 导航到 Azure 机器学习工作室Navigate to Azure Machine Learning studio.

  2. 在“管理”下,选择“计算” 。Under Manage, select Compute.

  3. 选择顶部的选项卡以显示各类计算目标。Select tabs at the top to show each type of compute target.

    查看计算目标的列表

创建计算目标Create compute target

遵循上述步骤查看计算目标的列表。Follow the previous steps to view the list of compute targets. 然后使用以下步骤创建计算目标:Then use these steps to create a compute target:

  1. 在顶部选择对应于将要创建的计算类型的选项卡。Select the tab at the top corresponding to the type of compute you will create.

  2. 如果你没有计算目标,请选择页面中间的“创建”。If you have no compute targets, select Create in the middle of the page.

    查看计算目标的列表

  3. 如果看到计算资源的列表,请选择列表上方的“+ 新建”。If you see a list of compute resources, select +New above the list.

    查看计算目标的列表

  4. 为你的计算类型填写表单:Fill out the form for your compute type:

  1. 选择“创建”。Select Create.

  2. 通过在列表中选择计算目标来查看创建操作的状态:View the status of the create operation by selecting the compute target from the list:

    查看计算目标的列表

计算实例Compute instance

使用上述步骤创建计算实例。Use the steps above to create the compute instance. 然后按如下所示填写表单:Then fill out the form as follows:

查看计算目标的列表

字段Field 说明Description
计算名称Compute name
  • 名称是必须提供的,且长度必须介于 3 到 24 个字符之间。Name is required and must be between 3 to 24 characters long.
  • 有效字符为大小写字母、数字和 - 字符。Valid characters are upper and lower case letters, digits, and the - character.
  • 名称必须以字母开头Name must start with a letter
  • 名称必须在 Azure 区域内的全部现有计算中都是唯一的。Name needs to be unique across all existing computes within an Azure region. 如果选择的名称不是唯一的,则会显示警报You will see an alert if the name you choose is not unique
  • 如果在名称中使用了 - 字符,在此字符之后必须至少跟有一个字母If - character is used, then it needs to be followed by at least one letter later in the name
  • 虚拟机类型Virtual machine type 选择“CPU”或“GPU”。Choose CPU or GPU. 此类型在创建后无法更改This type cannot be changed after creation
    虚拟机大小Virtual machine size 在你的区域中,支持的虚拟机大小可能会受到限制。Supported virtual machine sizes might be restricted in your region. 请查看可用性列表Check the availability list
    启用/禁用 SSH 访问Enable/disable SSH access 默认情况下会禁用 SSH 访问。SSH access is disabled by default. SSH 访问SSH access cannot be. 在创建后无法更改。changed after creation. 如果计划使用 VS Code Remote 以交互模式进行调试,请确保启用访问权限Make sure to enable access if you plan to debug interactively with VS Code Remote
    高级设置Advanced settings 可选。Optional. 配置虚拟网络Configure a virtual network. 指定资源组虚拟网络子网,以在 Azure 虚拟网络 (vnet) 中创建计算实例。Specify the Resource group, Virtual network, and Subnet to create the compute instance inside an Azure Virtual Network (vnet). 有关详细信息,请参阅 vnet 的这些网络要求For more information, see these network requirements for vnet.

    计算群集Compute clusters

    为训练、批量推理或强化学习工作负载创建单个或多个节点计算群集。Create a single or multi node compute cluster for your training, batch inferencing or reinforcement learning workloads. 使用上述步骤创建计算群集。Use the steps above to create the compute cluster. 然后按如下所示填写表单:Then fill out the form as follows:

    字段Field 说明Description
    计算名称Compute name
  • 名称是必须提供的,且长度必须介于 3 到 24 个字符之间。Name is required and must be between 3 to 24 characters long.
  • 有效字符为大小写字母、数字和 - 字符。Valid characters are upper and lower case letters, digits, and the - character.
  • 名称必须以字母开头Name must start with a letter
  • 名称必须在 Azure 区域内的全部现有计算中都是唯一的。Name needs to be unique across all existing computes within an Azure region. 如果选择的名称不是唯一的,则会显示警报You will see an alert if the name you choose is not unique
  • 如果在名称中使用了 - 字符,在此字符之后必须至少跟有一个字母If - character is used, then it needs to be followed by at least one letter later in the name
  • 虚拟机类型Virtual machine type 选择“CPU”或“GPU”。Choose CPU or GPU. 此类型在创建后无法更改This type cannot be changed after creation
    虚拟机优先级Virtual machine priority 选择“专用”或“低优先级”。Choose Dedicated or Low priority. 低优先级虚拟机的费用更低,但不能保证计算节点。Low priority virtual machines are cheaper but don't guarantee the compute nodes. 其他作业可能会抢先于你的作业执行。Your job may be preempted.
    虚拟机大小Virtual machine size 在你的区域中,支持的虚拟机大小可能会受到限制。Supported virtual machine sizes might be restricted in your region. 请查看可用性列表Check the availability list
    最小节点数Minimum number of nodes 需要预配的节点的最小数量。Minimum number of nodes that you want to provision. 如果需要专用数量的节点,请在此处设置所需计数。If you want a dedicated number of nodes, set that count here. 将最小值设置为 0 可节省费用,这样在群集空闲时就不需要为任何节点付费。Save money by setting the minimum to 0, so you won't pay for any nodes when the cluster is idle.
    最大节点数Maximum number of nodes 需要预配的节点的最大数量。Maximum number of nodes that you want to provision. 提交作业时,计算将自动缩放到此节点计数的最大值。The compute will autoscale to a maximum of this node count when a job is submitted.
    高级设置Advanced settings 可选。Optional. 配置虚拟网络Configure a virtual network. 指定资源组虚拟网络子网,以在 Azure 虚拟网络 (vnet) 中创建计算实例。Specify the Resource group, Virtual network, and Subnet to create the compute instance inside an Azure Virtual Network (vnet). 有关详细信息,请参阅 vnet 的这些网络要求For more information, see these network requirements for vnet. 另外请附加托管标识以授予对资源的访问权限Also attach managed identities to grant access to resources

    设置托管标识Set up managed identity

    Azure 机器学习计算群集还支持使用托管标识来验证对 Azure 资源的访问,而不需要在代码中包含凭据。Azure Machine Learning compute clusters also support managed identities to authenticate access to Azure resources without including credentials in your code. 托管标识分为两种类型:There are two types of managed identities:

    • 系统分配的托管标识将在 Azure 机器学习计算群集上直接启用。A system-assigned managed identity is enabled directly on the Azure Machine Learning compute cluster. 系统分配的标识的生命周期将直接绑定到计算群集。The life cycle of a system-assigned identity is directly tied to the compute cluster. 如果计算群集遭删除,Azure 会自动清理 Azure AD 中的凭据和标识。If the compute cluster is deleted, Azure automatically cleans up the credentials and the identity in Azure AD.
    • 用户分配的托管标识是通过 Azure 托管标识服务提供的独立 Azure 资源。A user-assigned managed identity is a standalone Azure resource provided through Azure Managed Identity service. 可以将一个用户分配的托管标识分配给多个资源,并根据需要将其保留任意长的时间。You can assign a user-assigned managed identity to multiple resources, and it persists for as long as you want.

    在群集创建过程中或在编辑计算群集详细信息时,请在“高级设置”中切换“分配托管标识”并指定系统分配的标识或用户分配的标识 。During cluster creation or when editing compute cluster details, in the Advanced settings, toggle Assign a managed identity and specify a system-assigned identity or user-assigned identity.

    托管标识用法Managed identity usage

    默认托管标识是系统分配的托管标识或第一个用户分配的托管标识。The default managed identity is the system-assigned managed identity or the first user-assigned managed identity.

    在运行期间,一个标识有两种应用:During a run there are two applications of an identity:

    1. 系统使用标识来设置用户的存储装载、容器注册表和数据存储。The system uses an identity to set up the user's storage mounts, container registry, and datastores.

      • 在这种情况下,系统将使用默认托管标识。In this case, the system will use the default-managed identity.
    2. 用户应用标识以便从已提交运行的代码中访问资源The user applies an identity to access resources from within the code for a submitted run

      • 在这种情况下,请提供与要用于检索凭据的托管标识对应的 client_id。In this case, provide the client_id corresponding to the managed identity you want to use to retrieve a credential.
      • 或者,通过 DEFAULT_IDENTITY_CLIENT_ID 环境变量获取用户分配的标识的客户端 ID。Alternatively, get the user-assigned identity's client ID through the DEFAULT_IDENTITY_CLIENT_ID environment variable.

      例如,若要使用默认托管标识检索数据存储的令牌,请执行以下操作:For example, to retrieve a token for a datastore with the default-managed identity:

      client_id = os.environ.get('DEFAULT_IDENTITY_CLIENT_ID')
      credential = ManagedIdentityCredential(client_id=client_id)
      token = credential.get_token('https://storage.azure.com/')
      

    推理群集Inference clusters

    重要

    将 Azure Kubernetes 服务与 Azure 机器学习配合使用有多个配置选项。Using Azure Kubernetes Service with Azure Machine Learning has multiple configuration options. 某些场景(如网络)需要额外的设置和配置。Some scenarios, such as networking, require additional setup and configuration. 有关将 AKS 与 Azure 机器学习配合使用的详细信息,请参阅创建和附加 Azure Kubernetes 服务群集For more information on using AKS with Azure ML, see Create and attach an Azure Kubernetes Service cluster.

    创建或附加 Azure Kubernetes 服务 (AKS) 群集以用于大规模推理。Create or attach an Azure Kubernetes Service (AKS) cluster for large scale inferencing. 使用上述步骤创建 AKS 群集。Use the steps above to create the AKS cluster. 然后按如下所示填写表单:Then fill out the form as follows:

    字段Field 说明Description
    计算名称Compute name
  • 必须提供名称。Name is required. 名称必须包含 2 到 16 个字符。Name must be between 2 to 16 characters.
  • 有效字符为大小写字母、数字和 - 字符。Valid characters are upper and lower case letters, digits, and the - character.
  • 名称必须以字母开头Name must start with a letter
  • 名称必须在 Azure 区域内的全部现有计算中都是唯一的。Name needs to be unique across all existing computes within an Azure region. 如果选择的名称不是唯一的,则会显示警报You will see an alert if the name you choose is not unique
  • 如果在名称中使用了 - 字符,在此字符之后必须至少跟有一个字母If - character is used, then it needs to be followed by at least one letter later in the name
  • Kubernetes 服务Kubernetes Service 选择“新建”,并填写表单的其余部分。Select Create New and fill out the rest of the form. 或者选择“使用现有”,然后从订阅中选择现有的 AKS 群集。Or select Use existing and then select an existing AKS cluster from your subscription.
    区域Region 选择将要在其中创建该群集的区域Select the region where the cluster will be created
    虚拟机大小Virtual machine size 在你的区域中,支持的虚拟机大小可能会受到限制。Supported virtual machine sizes might be restricted in your region. 请查看可用性列表Check the availability list
    群集目的Cluster purpose 请选择“生产”或“开发测试” Select Production or Dev-test
    节点数Number of nodes 节点数乘以虚拟机的核心 (vCPU) 数的结果必须大于等于 12。The number of nodes multiplied by the virtual machine’s number of cores (vCPUs) must be greater than or equal to 12.
    网络配置Network configuration 选择“高级”以在现有虚拟网络中创建计算。Select Advanced to create the compute within an existing virtual network. 若要详细了解虚拟网络中的 AKS,请参阅使用专用终结点和虚拟网络的训练和推理过程中的网络隔离For more information about AKS in a virtual network, see Network isolation during training and inference with private endpoints and virtual networks.
    启用 SSL 配置Enable SSL configuration 此选项用于针对计算配置 SSL 证书Use this to configure SSL certificate on the compute

    附加的计算Attached compute

    若要使用在 Azure 机器学习工作区外部创建的计算目标,必须附加这些计算目标。To use compute targets created outside the Azure Machine Learning workspace, you must attach them. 附加计算目标会使其可供你的工作区使用。Attaching a compute target makes it available to your workspace. 附加的计算用于为训练附加计算目标 。Use Attached compute to attach a compute target for training. 推理群集用于为推理附加 AKS 群集 。Use Inference clusters to attach an AKS cluster for inferencing.

    请使用上述步骤来附加计算。Use the steps above to attach a compute. 然后按如下所示填写表单:Then fill out the form as follows:

    1. 输入计算目标的名称。Enter a name for the compute target.

    2. 选择要附加的计算类型。Select the type of compute to attach. 并非所有计算类型都可以从 Azure 机器学习工作室附加。Not all compute types can be attached from Azure Machine Learning studio. 目前,可为训练附加的计算类型包括:The compute types that can currently be attached for training include:

      • 远程 VMA remote VM
      • Azure Databricks(在机器学习管道中使用)Azure Databricks (for use in machine learning pipelines)
      • Azure Data Lake Analytics(在机器学习管道中使用)Azure Data Lake Analytics (for use in machine learning pipelines)
      • Azure HDInsightAzure HDInsight
    3. 填写表单,并提供必需属性的值。Fill out the form and provide values for the required properties.

      备注

      Microsoft 建议使用 SSH 密钥,因为它们比密码更安全。Microsoft recommends that you use SSH keys, which are more secure than passwords. 密码很容易受到暴力破解攻击。Passwords are vulnerable to brute force attacks. SSH 密钥依赖于加密签名。SSH keys rely on cryptographic signatures. 若要了解如何创建用于 Azure 虚拟机的 SSH 密钥,请参阅以下文档:For information on how to create SSH keys for use with Azure Virtual Machines, see the following documents:

    4. 选择“附加”。Select Attach.

    后续步骤Next steps

    在创建目标并将其附加到工作区后,通过 ComputeTarget 对象在运行配置中使用该目标:After a target is created and attached to your workspace, you use it in your run configuration with a ComputeTarget object:

    from azureml.core.compute import ComputeTarget
    myvm = ComputeTarget(workspace=ws, name='my-vm-name')