设置模型训练和部署的计算目标Set up compute targets for model training and deployment

了解如何将 Azure 计算资源附加到 Azure 机器学习工作区。Learn how to attach Azure compute resources to your Azure Machine Learning workspace. 然后,可以将这些资源用作机器学习任务中的训练和推理计算目标Then you can use these resources as training and inference compute targets in your machine learning tasks.

本文介绍如何将工作区设置为使用这些计算资源:In this article, learn how to set up your workspace to use these compute resources:

  • 本地计算机Your local computer
  • 远程虚拟机Remote virtual machines
  • Azure HDInsightAzure HDInsight
  • Azure BatchAzure Batch
  • Azure DatabricksAzure Databricks
  • Azure 容器实例Azure Container Instance

若要使用 Azure 机器学习管理的计算目标,请参阅:To use compute targets managed by Azure Machine Learning, see:

先决条件Prerequisites

限制Limitations

  • 请勿在工作区中为同一计算创建多个同步附件。Do not create multiple, simultaneous attachments to the same compute from your workspace. 例如,使用两个不同的名称将一个 Azure Kubernetes 服务群集附加到工作区。For example, attaching one Azure Kubernetes Service cluster to a workspace using two different names. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

    如果要重新附加计算目标来实现某个目的(例如,更改 TLS 或其他群集配置设置),则必须先删除现有附件。If you want to reattach a compute target, for example to change TLS or other cluster configuration setting, you must first remove the existing attachment.

什么是计算目标?What's a compute target?

使用 Azure 机器学习可以在不同的资源或环境(统称为 计算目标)中训练模型。With Azure Machine Learning, you can train your model on a variety of resources or environments, collectively referred to as compute targets. 计算目标可以是本地计算机,也可以是云资源,例如 Azure 机器学习计算、Azure HDInsight 或远程虚拟机。A compute target can be a local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine. 还可以使用计算目标进行模型部署,如部署模型的位置和方式中所述。You also use compute targets for model deployment as described in "Where and how to deploy your models".

本地计算机Local computer

使用本地计算机进行训练时,无需创建计算目标。When you use your local computer for training, there is no need to create a compute target. 只需从本地计算机提交训练运行Just submit the training run from your local machine.

使用本地计算机进行推理时,必须安装 Docker。When you use your local computer for inference, you must have Docker installed. 若要执行部署,请使用 LocalWebservice.deploy_configuration() 来定义 Web 服务将使用的端口。To perform the deployment, use LocalWebservice.deploy_configuration() to define the port that the web service will use. 然后使用通过 Azure 机器学习部署模型中所述的常规部署流程。Then use the normal deployment process as described in Deploy models with Azure Machine Learning.

远程虚拟机Remote virtual machines

Azure 机器学习还支持连接 Azure 虚拟机。Azure Machine Learning also supports attaching an Azure Virtual Machine. VM 必须是 Azure Data Science Virtual Machine (DSVM)。The VM must be an Azure Data Science Virtual Machine (DSVM). 此 VM 在 Azure 中预配置了数据科学和 AI 开发环境。This VM is a pre-configured data science and AI development environment in Azure. 此 VM 提供精选的工具和框架用于满足整个机器学习开发生命周期的需求。The VM offers a curated choice of tools and frameworks for full-lifecycle machine learning development. 有关如何将 DSVM 与 Azure 机器学习配合使用的详细信息,请参阅配置开发环境For more information on how to use the DSVM with Azure Machine Learning, see Configure a development environment.

  1. 创建:创建 DSVM,然后使用它来训练模型。Create: Create a DSVM before using it to train your model. 若要创建此资源,请参阅预配适用于 Linux (Ubuntu) 的 Data Science Virtual MachineTo create this resource, see Provision the Data Science Virtual Machine for Linux (Ubuntu).

    警告

    Azure 机器学习仅支持运行 Ubuntu 的虚拟机。Azure Machine Learning only supports virtual machines that run Ubuntu. 创建 VM 或选择现有 VM 时,必须选择使用 Ubuntu 的 VM。When you create a VM or choose an existing VM, you must select a VM that uses Ubuntu.

    Azure 机器学习还要求虚拟机具有公共 IP 地址。Azure Machine Learning also requires the virtual machine to have a public IP address.

  2. 附加:若要附加现有虚拟机作为计算目标,必须提供虚拟机的资源 ID、用户名和密码。Attach: To attach an existing virtual machine as a compute target, you must provide the resource ID, user name, and password for the virtual machine. 可以使用订阅 ID、资源组名称和 VM 名称按以下字符串格式构造 VM 的资源 ID:/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>The resource ID of the VM can be constructed using the subscription ID, resource group name, and VM name using the following string format: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<vm_name>

    from azureml.core.compute import RemoteCompute, ComputeTarget
    
    # Create the compute config 
    compute_target_name = "attach-dsvm"
    
    attach_config = RemoteCompute.attach_configuration(resource_id='<resource_id>',
                                                    ssh_port=22,
                                                    username='<username>',
                                                    password="<password>")
    
    # Attach the compute
    compute = ComputeTarget.attach(ws, compute_target_name, attach_config)
    
    compute.wait_for_completion(show_output=True)
    

    或者,可以使用 Azure 机器学习工作室将 DSVM 附加到工作区。Or you can attach the DSVM to your workspace using Azure Machine Learning studio.

    警告

    请勿在工作区中为同一 DSVM 创建多个同步附件。Do not create multiple, simultaneous attachments to the same DSVM from your workspace. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

  3. 配置:为 DSVM 计算目标创建运行配置。Configure: Create a run configuration for the DSVM compute target. Docker 与 conda 用于在 DSVM 上创建和配置训练环境。Docker and conda are used to create and configure the training environment on the DSVM.

    from azureml.core import ScriptRunConfig
    from azureml.core.environment import Environment
    from azureml.core.conda_dependencies import CondaDependencies
    
    # Create environment
    myenv = Environment(name="myenv")
    
    # Specify the conda dependencies
    myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])
    
    # If no base image is explicitly specified the default CPU image "azureml.core.runconfig.DEFAULT_CPU_IMAGE" will be used
    # To use GPU in DSVM, you should specify the default GPU base Docker image or another GPU-enabled image:
    # myenv.docker.enabled = True
    # myenv.docker.base_image = azureml.core.runconfig.DEFAULT_GPU_IMAGE
    
    # Configure the run configuration with the Linux DSVM as the compute target and the environment defined above
    src = ScriptRunConfig(source_directory=".", script="train.py", compute_target=compute, environment=myenv) 
    

Azure HDInsightAzure HDInsight

Azure HDInsight 是用于大数据分析的热门平台。Azure HDInsight is a popular platform for big-data analytics. 该平台提供的 Apache Spark 可用于训练模型。The platform provides Apache Spark, which can be used to train your model.

  1. 创建:先创建 HDInsight 群集,然后使用它来训练模型。Create: Create the HDInsight cluster before you use it to train your model. 若要在 HDInsight 群集中创建 Spark,请参阅在 HDInsight 中创建 Spark 群集To create a Spark on HDInsight cluster, see Create a Spark Cluster in HDInsight.

    警告

    Azure 机器学习要求 HDInsight 群集具有公共 IP 地址。Azure Machine Learning requires the HDInsight cluster to have a public IP address.

    创建群集时,必须指定 SSH 用户名和密码。When you create the cluster, you must specify an SSH user name and password. 请记下这些值,因为在将 HDInsight 用作计算目标时需要用到这些值。Take note of these values, as you need them to use HDInsight as a compute target.

    创建群集后,使用主机名 <clustername>-ssh.azurehdinsight.cn 连接到该群集,其中的 <clustername> 是为该群集提供的名称。After the cluster is created, connect to it with the hostname <clustername>-ssh.azurehdinsight.cn, where <clustername> is the name that you provided for the cluster.

  2. 附加:若要将 HDInsight 群集附加为计算目标,必须提供该 HDInsight 群集的资源 ID、用户名和密码。Attach: To attach an HDInsight cluster as a compute target, you must provide the resource ID, user name, and password for the HDInsight cluster. 可以使用订阅 ID、资源组名称和 HDInsight 群集名称按以下字符串格式构造 HDInsight 群集的资源 ID:/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.HDInsight/clusters/<cluster_name>The resource ID of the HDInsight cluster can be constructed using the subscription ID, resource group name, and HDInsight cluster name using the following string format: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.HDInsight/clusters/<cluster_name>

    from azureml.core.compute import ComputeTarget, HDInsightCompute
    from azureml.exceptions import ComputeTargetException
    
    try:
    # if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase
    
    attach_config = HDInsightCompute.attach_configuration(resource_id='<resource_id>',
                                                          ssh_port=22, 
                                                          username='<ssh-username>', 
                                                          password='<ssh-pwd>')
    hdi_compute = ComputeTarget.attach(workspace=ws, 
                                       name='myhdi', 
                                       attach_configuration=attach_config)
    
    except ComputeTargetException as e:
    print("Caught = {}".format(e.message))
    
    hdi_compute.wait_for_completion(show_output=True)
    

    或者,可以使用 Azure 机器学习工作室将 HDInsight 群集附加到工作区。Or you can attach the HDInsight cluster to your workspace using Azure Machine Learning studio.

    警告

    请勿在工作区中为同一 HDInsight 创建多个同步附件。Do not create multiple, simultaneous attachments to the same HDInsight from your workspace. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

  3. 配置:为 HDI 计算目标创建运行配置。Configure: Create a run configuration for the HDI compute target.

     from azureml.core.runconfig import RunConfiguration
     from azureml.core.conda_dependencies import CondaDependencies
    
    
     # use pyspark framework
     run_hdi = RunConfiguration(framework="pyspark")
    
     # Set compute target to the HDI cluster
     run_hdi.target = hdi_compute.name
    
     # specify CondaDependencies object to ask system installing numpy
     cd = CondaDependencies()
     cd.add_conda_package('numpy')
     run_hdi.environment.python.conda_dependencies = cd
    

附加计算并配置运行后,下一步是提交训练运行Now that you've attached the compute and configured your run, the next step is to submit the training run.

Azure BatchAzure Batch

Azure Batch 用于在云中高效运行大规模并行高性能计算 (HPC) 应用程序。Azure Batch is used to run large-scale parallel and high-performance computing (HPC) applications efficiently in the cloud. 可以在 Azure 机器学习管道中使用 AzureBatchStep 将作业提交到 Azure Batch 计算机池。AzureBatchStep can be used in an Azure Machine Learning Pipeline to submit jobs to an Azure Batch pool of machines.

若要将 Azure Batch 附加为计算目标,必须使用 Azure 机器学习 SDK 并提供以下信息:To attach Azure Batch as a compute target, you must use the Azure Machine Learning SDK and provide the following information:

  • Azure Batch 计算名称:在工作区中用于计算的易记名称Azure Batch compute name: A friendly name to be used for the compute within the workspace
  • Azure Batch 帐户名称:Azure Batch 帐户的名称Azure Batch account name: The name of the Azure Batch account
  • 资源组:包含 Azure Batch 帐户的资源组。Resource Group: The resource group that contains the Azure Batch account.

以下代码演示如何将 Azure Batch 附加为计算目标:The following code demonstrates how to attach Azure Batch as a compute target:

from azureml.core.compute import ComputeTarget, BatchCompute
from azureml.exceptions import ComputeTargetException

# Name to associate with new compute in workspace
batch_compute_name = 'mybatchcompute'

# Batch account details needed to attach as compute to workspace
batch_account_name = "<batch_account_name>"  # Name of the Batch account
# Name of the resource group which contains this account
batch_resource_group = "<batch_resource_group>"

try:
    # check if the compute is already attached
    batch_compute = BatchCompute(ws, batch_compute_name)
except ComputeTargetException:
    print('Attaching Batch compute...')
    provisioning_config = BatchCompute.attach_configuration(
        resource_group=batch_resource_group, account_name=batch_account_name)
    batch_compute = ComputeTarget.attach(
        ws, batch_compute_name, provisioning_config)
    batch_compute.wait_for_completion()
    print("Provisioning state:{}".format(batch_compute.provisioning_state))
    print("Provisioning errors:{}".format(batch_compute.provisioning_errors))

print("Using Batch compute:{}".format(batch_compute.cluster_resource_id))

警告

请勿在工作区中为同一 Azure Batch 创建多个同步附件。Do not create multiple, simultaneous attachments to the same Azure Batch from your workspace. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

Azure DatabricksAzure Databricks

Azure Databricks 是 Azure 云中基于 Apache Spark 的环境。Azure Databricks is an Apache Spark-based environment in the Azure cloud. 它可以用作 Azure 机器学习管道的计算目标。It can be used as a compute target with an Azure Machine Learning pipeline.

请先创建 Azure Databricks 工作区,然后再使用该工作区。Create an Azure Databricks workspace before using it. 若要创建工作区资源,请参阅在 Azure Databricks 中运行 Spark 作业文档。To create a workspace resource, see the Run a Spark job on Azure Databricks document.

要将 Azure Databricks 附加为计算目标,请提供以下信息:To attach Azure Databricks as a compute target, provide the following information:

  • Databricks 计算名称:要分配给此计算资源的名称。Databricks compute name: The name you want to assign to this compute resource.
  • Databricks 工作区名称:Azure Databricks 工作区的名称。Databricks workspace name: The name of the Azure Databricks workspace.
  • Databricks 访问令牌:用于对 Azure Databricks 进行身份验证的访问令牌。Databricks access token: The access token used to authenticate to Azure Databricks. 若要生成访问令牌,请参阅身份验证文档。To generate an access token, see the Authentication document.

以下代码演示如何使用 Azure 机器学习 SDK 将 Azure Databricks 附加为计算目标(Databricks 工作区需要与 AML 工作区位于同一个订阅中):The following code demonstrates how to attach Azure Databricks as a compute target with the Azure Machine Learning SDK (The Databricks workspace need to be present in the same subscription as your AML workspace):

import os
from azureml.core.compute import ComputeTarget, DatabricksCompute
from azureml.exceptions import ComputeTargetException

databricks_compute_name = os.environ.get(
    "AML_DATABRICKS_COMPUTE_NAME", "<databricks_compute_name>")
databricks_workspace_name = os.environ.get(
    "AML_DATABRICKS_WORKSPACE", "<databricks_workspace_name>")
databricks_resource_group = os.environ.get(
    "AML_DATABRICKS_RESOURCE_GROUP", "<databricks_resource_group>")
databricks_access_token = os.environ.get(
    "AML_DATABRICKS_ACCESS_TOKEN", "<databricks_access_token>")

try:
    databricks_compute = ComputeTarget(
        workspace=ws, name=databricks_compute_name)
    print('Compute target already exists')
except ComputeTargetException:
    print('compute not found')
    print('databricks_compute_name {}'.format(databricks_compute_name))
    print('databricks_workspace_name {}'.format(databricks_workspace_name))
    print('databricks_access_token {}'.format(databricks_access_token))

    # Create attach config
    attach_config = DatabricksCompute.attach_configuration(resource_group=databricks_resource_group,
                                                           workspace_name=databricks_workspace_name,
                                                           access_token=databricks_access_token)
    databricks_compute = ComputeTarget.attach(
        ws,
        databricks_compute_name,
        attach_config
    )

    databricks_compute.wait_for_completion(True)

有关更详细的示例,请参阅 GitHub 上的 示例笔记本For a more detailed example, see an example notebook on GitHub.

警告

请勿在工作区中为同一 Azure Databricks 创建多个同步附件。Do not create multiple, simultaneous attachments to the same Azure Databricks from your workspace. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

Azure Data Lake AnalyticsAzure Data Lake Analytics

Azure Data Lake Analytics 是 Azure 云中的大数据分析平台。Azure Data Lake Analytics is a big data analytics platform in the Azure cloud. 它可以用作 Azure 机器学习管道的计算目标。It can be used as a compute target with an Azure Machine Learning pipeline.

使用该平台之前,请先创建 Azure Data Lake Analytics 帐户。Create an Azure Data Lake Analytics account before using it.

若要将 Data Lake Analytics 附加为计算目标,必须使用 Azure 机器学习 SDK 并提供以下信息:To attach Data Lake Analytics as a compute target, you must use the Azure Machine Learning SDK and provide the following information:

  • 计算名称:要分配给此计算资源的名称。Compute name: The name you want to assign to this compute resource.
  • 资源组:包含 Data Lake Analytics 帐户的资源组。Resource Group: The resource group that contains the Data Lake Analytics account.
  • 帐户名称:Data Lake Analytics 帐户名。Account name: The Data Lake Analytics account name.

以下代码演示如何将 Data Lake Analytics 附加为计算目标:The following code demonstrates how to attach Data Lake Analytics as a compute target:

import os
from azureml.core.compute import ComputeTarget, AdlaCompute
from azureml.exceptions import ComputeTargetException


adla_compute_name = os.environ.get(
    "AML_ADLA_COMPUTE_NAME", "<adla_compute_name>")
adla_resource_group = os.environ.get(
    "AML_ADLA_RESOURCE_GROUP", "<adla_resource_group>")
adla_account_name = os.environ.get(
    "AML_ADLA_ACCOUNT_NAME", "<adla_account_name>")

try:
    adla_compute = ComputeTarget(workspace=ws, name=adla_compute_name)
    print('Compute target already exists')
except ComputeTargetException:
    print('compute not found')
    print('adla_compute_name {}'.format(adla_compute_name))
    print('adla_resource_id {}'.format(adla_resource_group))
    print('adla_account_name {}'.format(adla_account_name))
    # create attach config
    attach_config = AdlaCompute.attach_configuration(resource_group=adla_resource_group,
                                                     account_name=adla_account_name)
    # Attach ADLA
    adla_compute = ComputeTarget.attach(
        ws,
        adla_compute_name,
        attach_config
    )

    adla_compute.wait_for_completion(True)

有关更详细的示例,请参阅 GitHub 上的 示例笔记本For a more detailed example, see an example notebook on GitHub.

警告

请勿在工作区中为同一 ADLA 创建多个同步附件。Do not create multiple, simultaneous attachments to the same ADLA from your workspace. 每个新附件都会破坏先前存在的附件。Each new attachment will break the previous existing attachment(s).

提示

Azure 机器学习管道只能处理 Data Lake Analytics 帐户的默认数据存储中存储的数据。Azure Machine Learning pipelines can only work with data stored in the default data store of the Data Lake Analytics account. 如果需要处理的数据不在默认存储中,可以在训练之前使用 DataTransferStep 复制数据。If the data you need to work with is in a non-default store, you can use a DataTransferStep to copy the data before training.

Azure 容器实例Azure Container Instance

Azure 容器实例 (ACI) 是在部署模型时动态创建的。Azure Container Instances (ACI) are created dynamically when you deploy a model. 不能以任何其他方式创建 ACI 和将其附加到工作区。You cannot create or attach ACI to your workspace in any other way. 有关详细信息,请参阅将模型部署到 Azure 容器实例For more information, see Deploy a model to Azure Container Instances.

Azure Kubernetes 服务Azure Kubernetes Service

与 Azure 机器学习一起使用时,Azure Kubernetes 服务 (AKS) 支持多种配置选项。Azure Kubernetes Service (AKS) allows for a variety of configuration options when used with Azure Machine Learning. 有关详细信息,请参阅如何创建和附加 Azure Kubernetes 服务For more information, see How to create and attach Azure Kubernetes Service.

Notebook 示例Notebook examples

请参阅以下 Notebook,其中提供了有关使用各种计算目标进行训练的示例:See these notebooks for examples of training with various compute targets:

阅读使用 Jupyter 笔记本探索此服务一文,了解如何运行笔记本。Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.

后续步骤Next steps