使用 Azure 机器学习创建、查看和部署自动化机器学习模型Create, review, and deploy automated machine learning models with Azure Machine Learning

本文介绍如何在不编写任何代码的情况下,在 Azure 机器学习工作室中创建、探索和部署自动化机器学习模型。In this article, you learn how to create, explore, and deploy automated machine learning models without a single line of code in Azure Machine Learning studio.

自动化机器学习是一个为你选择要用于特定数据的最佳机器学习算法的过程。Automated machine learning is a process in which the best machine learning algorithm to use for your specific data is selected for you. 通过此过程可以快速生成机器学习模型。This process enables you to generate machine learning models quickly. 详细了解自动化机器学习Learn more about automated machine learning.

对于端到端示例,请试用使用 Azure 机器学习的自动化 ML 界面创建分类模型教程For an end to end example, try the tutorial for creating a classification model with Azure Machine Learning's automated ML interface.

要获得 Python 基于代码的体验,请使用 Azure 机器学习 SDK 配置自动化机器学习试验For a Python code-based experience, configure your automated machine learning experiments with the Azure Machine Learning SDK.

先决条件Prerequisites

入门Get started

  1. 登录到 https://studio.ml.azure.cn 处的 Azure 机器学习。Sign in to Azure Machine Learning at https://studio.ml.azure.cn.

  2. 选择订阅和工作区。Select your subscription and workspace.

  3. 导航到左侧窗格。Navigate to the left pane. 在“创作”部分下,选择“自动化 ML” 。Select Automated ML under the Author section.

Azure 机器学习工作室导航窗格Azure Machine Learning studio navigation pane

如果这是第一个进行试验,将会看到空列表和文档链接。If this is your first time doing any experiments, you'll see an empty list and links to documentation.

否则,将看到最近进行的自动化机器学习试验列表,包括使用 SDK 创建的试验。Otherwise, you'll see a list of your recent automated machine learning experiments, including those created with the SDK.

创建并运行试验Create and run experiment

  1. 选择“+ 新建自动化 ML 运行”并填充窗体。Select + New automated ML run and populate the form.

  2. 选择存储容器中的数据集,或创建新数据集。Select a dataset from your storage container, or create a new dataset. 可以基于本地文件、Web URL、数据存储或 Azure 开放数据集创建数据集。Datasets can be created from local files, web urls, datastores, or Azure open datasets. 详细了解数据集创建Learn more about dataset creation.

    重要

    训练数据的要求:Requirements for training data:

    • 数据必须为表格格式。Data must be in tabular form.
    • 要预测的值(目标列)必须在数据中存在。The value you want to predict (target column) must be present in the data.
    1. 若要从本地计算机上的文件创建新的数据集,请选择“+创建数据集”,然后选择“从本地文件”。To create a new dataset from a file on your local computer, select +Create dataset and then select From local file .

    2. 在“基本信息”窗体中,为数据集指定唯一名称,并提供可选说明。In the Basic info form, give your dataset a unique name and provide an optional description.

    3. 选择“下一步”,以打开“数据存储和文件选择”窗体。Select Next to open the Datastore and file selection form . 在此窗体上,你需选择要将数据集上传到何处:将数据集上传到在工作区中自动创建的默认存储容器,或选择要用于试验的存储容器。On this form you select where to upload your dataset; the default storage container that's automatically created with your workspace, or choose a storage container that you want to use for the experiment.

      1. 如果数据位于虚拟网络后面,则需要启用“跳过验证”功能以确保工作区可以访问数据。If your data is behind a virtual network, you need to enable the skip the validation function to ensure that the workspace can access your data. 有关详细信息,请参阅在 Azure 虚拟网络中使用 Azure 机器学习工作室For more information, see Use Azure Machine Learning studio in an Azure virtual network.
    4. 选择“浏览”以上传数据集的数据文件。Select Browse to upload the data file for your dataset.

    5. 查看“设置和预览”窗体中内容的准确性。Review the Settings and preview form for accuracy. 该窗体是基于文件类型智能填充的。The form is intelligently populated based on the file type.

      字段Field 说明Description
      文件格式File format 定义文件中存储的数据的布局和类型。Defines the layout and type of data stored in a file.
      分隔符Delimiter 一个或多个字符,用于指定纯文本或其他数据流中不同的独立区域之间的边界。One or more characters for specifying the boundary between separate, independent regions in plain text or other data streams.
      编码Encoding 指定字符架构表中用于读取数据集的位。Identifies what bit to character schema table to use to read your dataset.
      列标题Column headers 指示如何处理数据集的标头(如果有)。Indicates how the headers of the dataset, if any, will be treated.
      跳过行Skip rows 指示要跳过数据集中的多少行(如果有)。Indicates how many, if any, rows are skipped in the dataset.

      选择“ 下一页 ”。Select Next .

    6. “架构”窗体是基于“设置和预览”窗体中所做的选择智能填充的。 The Schema form is intelligently populated based on the selections in the Settings and preview form. 在此处,请配置每个列的数据类型,检查列名称,并选择“不包含”哪些列进行试验。Here configure the data type for each column, review the column names, and select which columns to Not include for your experiment.

      选择“下一步”。Select Next.

    7. “确认详细信息”窗体上总结了先前在“基本信息”和“设置和预览”窗体中填充的信息 。The Confirm details form is a summary of the information previously populated in the Basic info and Settings and preview forms. 你还可以使用已启用分析的计算来为数据集创建数据配置文件。You also have the option to create a data profile for your dataset using a profiling enabled compute.

      选择“ 下一页 ”。Select Next .

  3. 新建的数据集出现后,请将其选中。Select your newly created dataset once it appears. 还可以查看数据集和样本统计信息的预览。You are also able to view a preview of the dataset and sample statistics.

  4. 在“配置运行”窗体中,输入唯一的试验名称。On the Configure run form, enter a unique experiment name.

  5. 选择目标列;这是要对其进行预测的列。Select a target column; this is the column that you would like to do predictions on.

  6. 为数据分析和训练作业选择计算。Select a compute for the data profiling and training job. 下拉列表中提供了现有计算的列表。A list of your existing computes is available in the dropdown. 若要创建新的计算,请按照步骤 7 中的说明操作。To create a new compute, follow the instructions in step 7.

  7. 选择“创建新计算”,以便为此试验配置计算上下文。Select Create a new compute to configure your compute context for this experiment.

    字段Field 说明Description
    计算名称Compute name 输入用于标识计算上下文的唯一名称。Enter a unique name that identifies your compute context.
    虚拟机优先级Virtual machine priority 低优先级虚拟机的费用更低,但不能保证计算节点。Low priority virtual machines are cheaper but don't guarantee the compute nodes.
    虚拟机类型Virtual machine type 选择“CPU”或“GPU”作为虚拟机类型。Select CPU or GPU for virtual machine type.
    虚拟机大小Virtual machine size 指定计算资源的虚拟机大小。Select the virtual machine size for your compute.
    最小/最大节点数Min / Max nodes 若要分析数据,必须指定一个或多个节点。To profile data, you must specify 1 or more nodes. 输入计算的最大节点数。Enter the maximum number of nodes for your compute. 对于 AML 计算,默认值为 6 个节点。The default is 6 nodes for an AML Compute.
    高级设置Advanced settings 使用这些设置可以配置用户帐户和现有虚拟网络以进行试验。These settings allow you to configure a user account and existing virtual network for your experiment.

    选择“创建”。Select Create . 创建新计算可能需要花费几分钟时间。Creation of a new compute can take a few minutes.

    备注

    计算名称将会指示选择/创建的计算是否已启用分析。Your compute name will indicate if the compute you select/create is profiling enabled . 选择“ 下一页 ”。Select Next .

  8. 在“任务类型和设置”窗体中选择任务类型:分类、回归或预测。On the Task type and settings form, select the task type: classification, regression, or forecasting. 有关详细信息,请参阅支持的任务类型See supported task types for more information.

    1. 如需 分类 ,还可以启用深度学习。For classification , you can also enable deep learning.

      如果启用了深度学习,则只能使用“训练/验证数据拆分”进行验证。If deep learning is enabled, validation is limited to train_validation split . 详细了解验证选项Learn more about validation options.

    2. 对于预测,可以For forecasting you can,

      1. 启用深度学习。Enable deep learning.

      2. 选择“时间列”:此列包含要使用的时间数据。Select time column : This column contains the time data to be used.

      3. 选择“预测范围”:指示模型可以预测未来的多少个时间单位(分钟/小时/天/周/月/年)。Select forecast horizon : Indicate how many time units (minutes/hours/days/weeks/months/years) will the model be able to predict to the future. 模型需要预测的未来时间越久远,其准确度越低。The further the model is required to predict into the future, the less accurate it will become. 详细了解预测和预测范围Learn more about forecasting and forecast horizon.

  9. (可选)查看附加配置设置:可用来更好地控制训练作业的其他设置。(Optional) View addition configuration settings: additional settings you can use to better control the training job. 否则,将会根据试验选择和数据应用默认设置。Otherwise, defaults are applied based on experiment selection and data.

    其他配置Additional configurations 说明Description
    主要指标Primary metric 用于对模型进行评分的主要指标。Main metric used for scoring your model. 详细了解模型指标Learn more about model metrics.
    解释最佳模型Explain best model 选择启用或禁用,以确定是否显示建议的最佳模型的说明。Select to enable or disable, in order to show explanations for the recommended best model.
    此功能当前不可用于特定的预测算法This functionality is not currently available for certain forecasting algorithms.
    阻止的算法Blocked algorithm 选择要从训练作业中排除的算法。Select algorithms you want to exclude from the training job.

    允许算法只适用于 SDK 试验Allowing algorithms is only available for SDK experiments.
    请参阅每种任务类型支持的模型See the supported models for each task type.
    退出条件Exit criterion 如果满足其中的任一条件,则会停止训练作业。When any of these criteria are met, the training job is stopped.
    训练作业时间(小时) :允许训练作业运行多长时间。Training job time (hours) : How long to allow the training job to run.
    指标评分阈值 :所有管道的最低指标评分。Metric score threshold : Minimum metric score for all pipelines. 这可以确保在你具有一个要实现的已定义目标指标时,无需花费不必要的时间来完成训练作业。This ensures that if you have a defined target metric you want to reach, you do not spend more time on the training job than necessary.
    验证Validation 选择要在训练作业中使用的交叉验证选项之一。Select one of the cross validation options to use in the training job.
    详细了解交叉验证Learn more about cross validation.

    预测只支持 k-折交叉验证。Forecasting only supports k-fold cross validation.
    并发Concurrency 最大并发迭代数 :要在训练作业中测试的最大管道(迭代)数。Max concurrent iterations : Maximum number of pipelines (iterations) to test in the training job. 作业运行的迭代数不会超过指定的数目。The job will not run more than the specified number of iterations.
  10. (可选)查看特征化设置:如果选择在“其他配置设置”窗体中启用“自动特征化”,则会应用默认的特征化技术 。(Optional) View featurization settings: if you choose to enable Automatic featurization in the Additional configuration settings form, default featurization techniques are applied. 在“查看特征化设置”中,可以更改这些默认设置并相应地进行自定义。In the View featurization settings you can change these defaults and customize accordingly. 了解如何自定义特征化Learn how to customize featurizations.

    Azure 机器学习工作室任务类型窗体

自定义特征化Customize featurization

在“特征化”窗体中,可以启用/禁用自动特征化,并为试验自定义自动特征化设置。In the Featurization form, you can enable/disable automatic featurization and customize the automatic featurization settings for your experiment. 若要打开此窗体,请参阅创建并运行试验部分中的步骤 10。To open this form, see step 10 in the Create and run experiment section.

下表汇总了工作室中目前可用的自定义。The following table summarizes the customizations currently available via the studio.

Column 自定义Customization
已含Included 指定训练时要包含的列。Specifies which columns to include for training.
特征类型Feature type 更改选定列的值类型。Change the value type for the selected column.
插补值Impute with 选择数据中用于插补缺失值的值。Select what value to impute missing values with in your data.

Azure 机器学习工作室任务类型窗体

运行试验并查看结果Run experiment and view results

选择“完成”来运行试验。Select Finish to run your experiment. 试验准备过程可能需要长达 10 分钟的时间。The experiment preparing process can take up to 10 minutes. 训练作业可能需要额外的 2 - 3 分钟才能完成每个管道的运行。Training jobs can take an additional 2-3 minutes more for each pipeline to finish running.

查看试验详细信息View experiment details

“运行详细信息”屏幕中会打开“详细信息”选项卡。 此屏幕显示试验运行的摘要,在顶部的运行编号旁会显示状态栏。The Run Detail screen opens to the Details tab. This screen shows you a summary of the experiment run including a status bar at the top next to the run number.

“模型”选项卡包含按指标评分排序的已创建模型列表。The Models tab contains a list of the models created ordered by the metric score. 默认情况下,列表中首先显示评分最高的模型(评分根据所选指标给出)。By default, the model that scores the highest based on the chosen metric is at the top of the list. 随着训练作业尝试更多模型,它们会添加到列表中。As the training job tries out more models, they are added to the list. 使用此项可以快速比较目前为止生成的模型的指标。Use this to get a quick comparison of the metrics for the models produced so far.

运行详细信息仪表板Run details dashboard

查看训练运行详细信息View training run details

向下钻取任何已完成的模型,以查看训练运行详细信息,例如“模型”选项卡上的模型摘要或“指标”选项卡上的性能指标图表。详细了解图表Drill down on any of the completed models to see training run details, like a model summary on the Model tab or performance metric charts on the Metrics tab. Learn more about charts.

迭代详细信息Iteration details

部署模型Deploy your model

获得最佳模型后,可将其部署为 Web 服务以根据新数据进行预测。Once you have the best model at hand, it is time to deploy it as a web service to predict on new data.

自动化 ML 可帮助你在不编写任何代码的情况下部署模型:Automated ML helps you with deploying the model without writing code:

  1. 可以使用多个部署选项。You have a couple options for deployment.

    • 选项 1:根据定义的指标条件部署最佳模型。Option 1: Deploy the best model, according to the metric criteria you defined.

      1. 试验完成后,选择屏幕顶部的“运行 1”,导航到父运行页面。After the experiment is complete, navigate to the parent run page by selecting Run 1 at the top of the screen.
      2. 选择“最佳模型摘要”部分中列出的模型。Select the model listed in the Best model summary section.
      3. 选择窗口左上角的“部署”。Select Deploy on the top left of the window.
    • 选项 2:从此试验部署特定模型迭代。Option 2: To deploy a specific model iteration from this experiment.

      1. 从“模型”选项卡中选择所需模型Select the desired model from the Models tab
      2. 选择窗口左上角的“部署”。Select Deploy on the top left of the window.
  2. 填充“部署模型”窗格。Populate the Deploy model pane.

    字段Field Value
    名称Name 输入部署的唯一名称。Enter a unique name for your deployment.
    说明Description 输入说明,以更清楚地指出此部署的用途。Enter a description to better identify what this deployment is for.
    计算类型Compute type 选择要部署的终结点类型:Azure Kubernetes 服务 (AKS) 或 Azure 容器实例 (ACI) 。Select the type of endpoint you want to deploy: Azure Kubernetes Service (AKS) or Azure Container Instance (ACI) .
    计算名称Compute name 仅适用于 AKS:选择要部署到的 AKS 群集的名称。Applies to AKS only: Select the name of the AKS cluster you wish to deploy to.
    启用身份验证Enable authentication 选择此项将允许基于令牌或基于密钥的身份验证。Select to allow for token-based or key-based authentication.
    使用自定义部署资产Use custom deployment assets 若要上传自己的评分脚本和环境文件,请启用此功能。Enable this feature if you want to upload your own scoring script and environment file. 详细了解评分脚本Learn more about scoring scripts.

    重要

    文件名不能超过 32 个字符,并且必须以字母数字开头和结尾。File names must be under 32 characters and must begin and end with alphanumerics. 开头和结尾之间可以包含短划线、下划线、句点和字母数字。May include dashes, underscores, dots, and alphanumerics between. 不允许空格。Spaces are not allowed.

    “高级”菜单提供默认部署功能,例如数据收集和资源利用率设置。The Advanced menu offers default deployment features such as data collection and resource utilization settings. 若要替代这些默认设置,请在此菜单中替代。If you wish to override these defaults do so in this menu.

  3. 选择“部署”。Select Deploy . 完成部署可能需要大约 20 分钟。Deployment can take about 20 minutes to complete. 部署开始后,将显示“模型摘要”选项卡。Once deployment begins, the Model summary tab appears. 在“部署状态”部分下查看部署进度。See the deployment progress under the Deploy status section.

现在,你已获得一个正常运行的、可以生成预测结果的 Web 服务!Now you have an operational web service to generate predictions! 可以通过 Power BI 内置的 Azure 机器学习支持查询该服务,以测试预测。You can test the predictions by querying the service from Power BI's built in Azure Machine Learning support.

后续步骤Next steps