在云中使用自动化机器学习对模型进行训练Train models with automated machine learning in the cloud

应用于:是基本版是Enterprise 版本               (升级到 Enterprise 版本APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

在 Azure 机器学习中,我们在所管理的不同类型的计算资源上训练模型。In Azure Machine Learning, you train your model on different types of compute resources that you manage. 计算目标可以是本地计算机,也可以是云中的资源。The compute target could be a local computer or a resource in the cloud.

可以通过添加 Azure 机器学习计算 (AmlCompute) 等附加计算目标,轻松地纵向扩展或横向扩展机器学习试验。You can easily scale up or scale out your machine learning experiment by adding additional compute targets, such as Azure Machine Learning Compute (AmlCompute). AmlCompute 是一个托管的计算基础结构,可让你轻松创建单节点或多节点计算。AmlCompute is a managed-compute infrastructure that allows you to easily create a single or multi-node compute.

在本文中,你将了解如何使用自动化 ML 和 AmlCompute 来构建模型。In this article, you learn how to build a model using automated ML with AmlCompute.

远程与本地有何区别?How does remote differ from local?

教程“使用自动化机器学习训练分类模型”讲授了如何使用本地计算机通过自动化机器学习来训练模型。The tutorial "Train a classification model with automated machine learning" teaches you how to use a local computer to train a model with automated ML. 本地培训的工作流同样适用于远程目标。The workflow when training locally also applies to remote targets as well. 但是,使用远程计算,能够以异步方式执行自动化机器学习试验迭代。However, with remote compute, automated ML experiment iterations are executed asynchronously. 此功能允许你取消特定迭代,观察执行状态,或继续在 Jupyter 笔记本的其他单元格上处理。This functionality allows you to cancel a particular iteration, watch the status of the execution, or continue to work on other cells in the Jupyter notebook. 若要进行远程训练,首先要创建一个远程计算目标,例如 AmlCompute。To train remotely, you first create a remote compute target such as AmlCompute. 然后,配置远程资源,并在那里提交代码。Then you configure the remote resource and submit your code there.

本文展示了在远程 AmlCompute 目标上运行自动化机器学习试验所需的额外步骤。This article shows the extra steps needed to run an automated ML experiment on a remote AmlCompute target. 本教程中的工作区对象 ws 将会在此处的整个代码中使用。The workspace object, ws, from the tutorial is used throughout the code here.

ws = Workspace.from_config()

创建资源Create resource

在工作区 (ws) 中创建 AmlCompute 目标(如果它尚不存在)。Create the AmlCompute target in your workspace (ws) if it doesn't already exist.

时间估计:创建 AmlCompute 目标需要大约 5 分钟。Time estimate: Creation of the AmlCompute target takes approximately 5 minutes.

from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpu-cluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

现在,可以使用 compute_target 对象作为远程计算目标。You can now use the compute_target object as the remote compute target.

群集名称限制包括:Cluster name restrictions include:

  • 必须小于 64 个字符。Must be shorter than 64 characters.
  • 不得包含以下任何字符:\ ~ !Cannot include any of the following characters: \ ~ ! @ # $ % ^ & * ( ) = + _ [ ] { } \\ | ; : ' \" , < > / ?.`@ # $ % ^ & * ( ) = + _ [ ] { } \\ | ; : ' \" , < > / ?.`

使用 TabularDataset 函数访问数据Access data using TabularDataset function

将 training_data 定义为 TabularDataset 和标签,并将其传递给 AutoMLConfig 中的自动 ML。Defined training_data as TabularDataset and the label, which are passed to Automated ML in the AutoMLConfig. 默认情况下,TabularDataset 方法 from_delimited_filesinfer_column_types 设置为 true,这将自动推断列类型。The TabularDataset method from_delimited_files, by default, sets the infer_column_types to true, which will infer the columns type automatically.

如果确实希望手动设置列类型,可以设置 set_column_types 参数来手动设置每个列的类型。If you do wish to manually set the column types, you can set the set_column_types argument to manually set the type of each column. 在下面的代码示例中,数据来自 sklearn 包。In the following code sample, the data comes from the sklearn package.

from sklearn import datasets
from azureml.core.dataset import Dataset
from scipy import sparse
import numpy as np
import pandas as pd
import os

# Create a project_folder if it doesn't exist
if not os.path.isdir('data'):
    os.mkdir('data')
    
if not os.path.exists('project_folder'):
    os.makedirs('project_folder')

X = pd.DataFrame(data_train.data[100:,:])
y = pd.DataFrame(data_train.target[100:])

# merge X and y
label = "digit"
X[label] = y

training_data = X

training_data.to_csv('data/digits.csv')
ds = ws.get_default_datastore()
ds.upload(src_dir='./data', target_path='digitsdata', overwrite=True, show_progress=True)

training_data = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/digits.csv'))

配置试验Configure experiment

AutoMLConfig 指定设置。Specify the settings for AutoMLConfig. (请参阅完整参数列表及其可能值。)(See a full list of parameters and their possible values.)

from azureml.train.automl import AutoMLConfig
import time
import logging

automl_settings = {
    "name": "AutoML_Demo_Experiment_{0}".format(time.time()),
    "experiment_timeout_minutes" : 20,
    "enable_early_stopping" : True,
    "iteration_timeout_minutes": 10,
    "n_cross_validations": 5,
    "primary_metric": 'AUC_weighted',
    "max_concurrent_iterations": 10,
}

automl_config = AutoMLConfig(task='classification',
                             debug_log='automl_errors.log',
                             path=project_folder,
                             compute_target=compute_target,
                             training_data=training_data,
                             label_column_name=label,
                             **automl_settings,
                             )

提交训练试验Submit training experiment

现在,请提交配置,以自动选择算法、超参数并定型模型。Now submit the configuration to automatically select the algorithm, hyper parameters, and train the model.

from azureml.core.experiment import Experiment
experiment = Experiment(ws, 'automl_remote')
remote_run = experiment.submit(automl_config, show_output=True)

将会看到类似于以下示例的输出:You will see output similar to the following example:

Running on remote compute: mydsvmParent Run ID: AutoML_015ffe76-c331-406d-9bfd-0fd42d8ab7f6
***********************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE:  A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
***********************************************************************************************

 ITERATION     PIPELINE                               DURATION                METRIC      BEST
         2      Standardize SGD classifier            0:02:36                  0.954     0.954
         7      Normalizer DT                         0:02:22                  0.161     0.954
         0      Scale MaxAbs 1 extra trees            0:02:45                  0.936     0.954
         4      Robust Scaler SGD classifier          0:02:24                  0.867     0.954
         1      Normalizer kNN                        0:02:44                  0.984     0.984
         9      Normalizer extra trees                0:03:15                  0.834     0.984
         5      Robust Scaler DT                      0:02:18                  0.736     0.984
         8      Standardize kNN                       0:02:05                  0.981     0.984
         6      Standardize SVM                       0:02:18                  0.984     0.984
        10      Scale MaxAbs 1 DT                     0:02:18                  0.077     0.984
        11      Standardize SGD classifier            0:02:24                  0.863     0.984
         3      Standardize gradient boosting         0:03:03                  0.971     0.984
        12      Robust Scaler logistic regression     0:02:32                  0.955     0.984
        14      Scale MaxAbs 1 SVM                    0:02:15                  0.989     0.989
        13      Scale MaxAbs 1 gradient boosting      0:02:15                  0.971     0.989
        15      Robust Scaler kNN                     0:02:28                  0.904     0.989
        17      Standardize kNN                       0:02:22                  0.974     0.989
        16      Scale 0/1 gradient boosting           0:02:18                  0.968     0.989
        18      Scale 0/1 extra trees                 0:02:18                  0.828     0.989
        19      Robust Scaler kNN                     0:02:32                  0.983     0.989

浏览结果Explore results

可以使用与培训教程中显示的内容相同的 Jupyter 小组件来查看图表和结果表格。You can use the same Jupyter widget as shown in the training tutorial to see a graph and table of results.

from azureml.widgets import RunDetails
RunDetails(remote_run).show()

下面是小组件的静态图像。Here is a static image of the widget. 在笔记本中,可以单击表格中的任意一行,查看运行属性和该运行的输出日志。In the notebook, you can click on any line in the table to see run properties and output logs for that run. 此外,还可以使用图表上方的下拉列表来查看每个迭代的每个可用指标的图表。You can also use the dropdown above the graph to view a graph of each available metric for each iteration.

小组件表 小组件绘图widget table widget plot

小组件将显示可用于查看和浏览单个运行详细信息的 URL。The widget displays a URL you can use to see and explore the individual run details.

如果你不在 Jupyter 笔记本中,可以从运行本身显示 URL:If you aren't in a Jupyter notebook, you can display the URL from the run itself:

remote_run.get_portal_url()

工作区中提供了相同的信息。The same information is available in your workspace. 若要了解有关这些结果的详细信息,请参阅了解自动化机器学习结果To learn more about these results, see Understand automated machine learning results.

示例Example

以下笔记本演示了本文中的概念。The following notebook demonstrates concepts in this article.

阅读使用 Jupyter 笔记本探索此服务一文,了解如何运行笔记本。Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.

后续步骤Next steps