使用 Azure 机器学习设计器(预览版)重新训练模型Retrain models with Azure Machine Learning designer (preview)

应用于:否基本版是企业版            (升级到企业版APPLIES TO: noBasic edition yesEnterprise edition                       (Upgrade to Enterprise)

本操作说明文章介绍了如何使用 Azure 机器学习设计器重新训练机器学习模型。In this how-to article, you learn how to use Azure Machine Learning designer to retrain a machine learning model. 你将使用已发布的管道自动执行工作流,并设置参数以使用新数据训练模型。You will use published pipelines to automate your workflow and set parameters to train your model on new data.

在本文中,学习如何:In this article, you learn how to:

  • 训练机器学习模型。Train a machine learning model.
  • 创建管道参数。Create a pipeline parameter.
  • 发布训练管道。Publish your training pipeline.
  • 使用新参数重新训练模型。Retrain your model with new parameters.

先决条件Prerequisites

本文还假定你已掌握在设计器中生成管道的基本知识。This article also assumes that you have basic knowledge of building pipelines in the designer. 如需了解引导式简介,请完成教程For a guided introduction, complete the tutorial.

示例管道Sample pipeline

本文中使用的管道是改版的示例 3:收入预测The pipeline used in this article is an altered version of Sample 3: Income prediction. 该管道使用导入数据模块,而不是用于演示如何使用自己的数据训练模型的示例数据集。The pipeline uses the Import Data module instead of the sample dataset to show you how to train models using your own data.

屏幕截图显示了修改后的示例管道,并使用方框突出显示“导入数据”模块

创建管道参数Create a pipeline parameter

创建管道参数,以在运行时动态设置变量。Create pipeline parameters to dynamically set variables at runtime. 此例将训练数据路径从固定值更改为参数,这样便可以使用不同的数据重新训练模型。For this example, you will change the training data path from a fixed value to a parameter, so that you can retrain your model on different data.

  1. 选择“导入数据”模块。Select the Import Data module.

    Note

    此例使用“导入数据”模块访问已注册数据存储中的数据。This example uses the Import Data module to access data in a registered datastore. 但如果使用备用的数据访问模式,则可以遵循类似的步骤操作。However, you can follow similar steps if you use alternative data access patterns.

  2. 在画布右侧的模块详细信息窗格中,选择数据源。In the module detail pane, to the right of the canvas, select your data source.

  3. 输入数据的路径。Enter the path to your data. 还可以选择“浏览路径”,以浏览文件树。You can also select Browse path to browse your file tree.

  4. 将鼠标悬停在“路径”字段,然后选择显示的“路径”字段上方的省略号 。Mouseover the Path field, and select the ellipses above the Path field that appear.

    屏幕截图显示了如何创建管道参数

  5. 选择“添加到管道参数”。Select Add to pipeline parameter.

  6. 提供参数名称和默认值。Provide a parameter name and a default value.

    Note

    你可以选择管道草稿标题旁边的“设置”齿轮图标来检查和编辑管道参数。You can inspect and edit your pipeline parameters by selecting the Settings gear icon next to the title of your pipeline draft.

  7. 选择“保存”。Select Save.

  8. 提交管道运行。Submit the pipeline run.

查找定型的模型Find a trained model

设计器会将所有管道输出(包括定型的模型)保存到默认工作区存储帐户。The designer saves all pipeline output, including trained models, to the default workspace storage account. 你也可以直接在设计器中访问定型的模型:You can also access trained models directly in the designer:

  1. 等待管道完成运行。Wait for the pipeline to finish running.
  2. 选择训练模型模块。Select the Train Model module.
  3. 在画布右侧的模块详细信息窗格中,选择“输出 + 日志”。In the module details pane, to the right of the canvas, select Outputs + logs.
  4. 在“其他输出”中可以找到模型以及运行日志。You can find your model in Other outputs along with run logs.
  5. 或者,也可选择“查看输出”图标。Alternatively, select the View output icon. 在这里,可以按照对话框中的说明直接导航到数据存储。From here, you can follow the instruction in the dialog to navigate directly to your datastore.

屏幕截图显示了如何下载定型的模型

发布训练管道Publish a training pipeline

将管道发布到管道终结点,便于将来轻松地重新使用管道。Publish a pipeline to a pipeline endpoint to easily reuse your pipelines in the future. 管道终结点会创建 REST 终结点,供将来调用管道。A pipeline endpoint creates a REST endpoint to invoke pipeline in the future. 在此例中,借助管道终结点,你可以重新使用管道来根据不同的数据重新训练模型。In this example, your pipeline endpoint lets you reuse your pipeline to retrain a model on different data.

  1. 选择设计器画布上方的“发布”。Select Publish above the designer canvas.

  2. 选择或创建管道终结点。Select or create a pipeline endpoint.

    Note

    可将多个管道发布到一个终结点。You can publish multiple pipelines to a single endpoint. 给定终结点中的每个管道都有一个版本号,你可以在调用管道终结点时指定该版本号。Each pipeline in a given endpoint is given a version number, which you can specify when you call the pipeline endpoint.

  3. 选择“发布”。Select Publish.

重新训练模型Retrain your model

现在你已经有了一个已发布的训练管道,接下来就可以使用它来根据新数据重新训练模型。Now that you have a published training pipeline, you can use it to retrain your model on new data. 你可以从工作室工作区或以编程方式通过管道终结点提交运行。You can submit runs from a pipeline endpoint from the studio workspace or programmatically.

使用设计器提交运行Submit runs by using the designer

使用以下步骤通过设计器提交参数化管道终结点:Use the following steps to submit a parameterized pipeline endpoint run from the designer:

  1. 转到工作室工作区中的“终结点”页。Go to the Endpoints page in your studio workspace.
  2. 选择“管道终结点”选项卡。然后,选择管道终结点。Select the Pipeline endpoints tab. Then, select your pipeline endpoint.
  3. 选择“已发布的管道”选项卡。然后,选择要运行的管道版本。Select the Published pipelines tab. Then, select the pipeline version that you want to run.
  4. 选择“提交”。Select Submit.
  5. 在“设置”对话框中,可以为运行指定参数值。In the setup dialog box, you can specify the parameters values for the run. 对于本例,请更新数据路径,使用非美国数据集来训练模型。For this example, update the data path to train your model using a non-US dataset.

屏幕截图显示了如何在设计器中设置参数化管道运行

使用代码提交运行Submit runs by using code

在“概述”面板中可以找到已发布管道的 REST 终结点。You can find the REST endpoint of a published pipeline in the overview panel. 通过调用终结点,可重新训练已发布的管道。By calling the endpoint, you can retrain the published pipeline.

若要进行 REST 调用,需要 OAuth 2.0 持有者类型的身份验证标头。To make a REST call, you need an OAuth 2.0 bearer-type authentication header. 要了解如何设置针对工作区的身份验证并执行参数化 REST 调用,请参阅生成用于批量评分的 Azure 机器学习管道For information about setting up authentication to your workspace and making a parameterized REST call, see Build an Azure Machine Learning pipeline for batch scoring.

后续步骤Next steps

本文介绍了如何使用设计器创建参数化训练管道终结点。In this article, you learned how to create a parameterized training pipeline endpoint using the designer.

有关如何部署模型以执行预测的完整演示,请参阅设计器教程以训练和部署回归模型。For a complete walkthrough of how you can deploy a model to make predictions, see the designer tutorial to train and deploy a regression model.