“定型模型”模块Train Model module

本文介绍了 Azure 机器学习设计器(预览版)中的一个模块。This article describes a module in Azure Machine Learning designer (preview).

使用此模块,可以训练分类或回归模型。Use this module to train a classification or regression model. 训练在你定义模型并设置其参数后进行,并且需要带标记的数据。Training takes place after you have defined a model and set its parameters, and requires tagged data. 你还可以使用训练模型来使用新数据重新训练现有模型。You can also use Train Model to retrain an existing model with new data.

训练过程的工作原理How the training process works

在 Azure 机器学习中,创建和使用机器学习模型通常是一个三步过程。In Azure Machine Learning, creating and using a machine learning model is typically a three-step process.

  1. 可以通过选择特定类型的算法并定义其参数或超参数来配置模型。You configure a model, by choosing a particular type of algorithm, and defining its parameters or hyperparameters. 选择以下任一模型类型:Choose any of the following model types:

    • 分类模型,基于神经网络、决策树,以及决策林和其他算法。Classification models, based on neural networks, decision trees, and decision forests, and other algorithms.
    • 回归模型,这可能包括标准线性回归,也可能使用其他算法(包括神经网络和 Bayesian 回归)。Regression models, which can include standard linear regression, or which use other algorithms, including neural networks and Bayesian regression.
  2. 提供一个带标记且其数据与算法兼容的数据集。Provide a dataset that is labeled, and has data compatible with the algorithm. 将数据和模型都连接到训练模型Connect both the data and the model to Train Model.

    训练产生的是特定的二元格式 iLearner,它封装了从数据中获知的统计模式。What training produces is a specific binary format, the iLearner, that encapsulates the statistical patterns learned from the data. 你无法直接修改或读取此格式;但是,其他模块可以使用此训练后的模型。You cannot directly modify or read this format; however, other modules can use this trained model.

    你还可以查看模型的属性。You can also view properties of the model. 有关详细信息,请参阅“结果”部分。For more information, see the Results section.

  3. 在训练完成后,使用训练后的模型与评分模型之一来基于新数据进行预测。After training is completed, use the trained model with one of the scoring modules, to make predictions on new data.

如何使用训练模型How to use Train Model

  1. 在 Azure 机器学习中,配置一个分类模型或回归模型。In Azure Machine Learning, configure a classification model or regression model.

  2. 将“训练模型” 模块添加到管道。Add the Train Model module to the pipeline. 可以在“机器学习” 类别下找到此模块。You can find this module under the Machine Learning category. 展开“训练” ,然后将“训练模型” 模块拖到你的管道中。Expand Train, and then drag the Train Model module into your pipeline.

  3. 在左侧输入中,附加未训练的模式。On the left input, attach the untrained mode. 将训练数据集附加到训练模型的右侧输入。Attach the training dataset to the right-hand input of Train Model.

    训练数据集必须包含一个标签列。The training dataset must contain a label column. 不带标签的任何行都将被忽略。Any rows without labels are ignored.

  4. 对于“标签列” ,单击模块右侧面板中的“编辑列” ,然后选择包含模型可用于训练的结果的单个列。For Label column, click Edit column in the right panel of module, and choose a single column that contains outcomes the model can use for training.

    • 对于分类问题,标签列必须包含分类值或离散值。For classification problems, the label column must contain either categorical values or discrete values. 可能的一些示例如下:“是/否”评级、疾病分类代码或名称,或收入组。Some examples might be a yes/no rating, a disease classification code or name, or an income group. 如果你选择了一个非分类列,则模块在训练期间将返回错误。If you pick a noncategorical column, the module will return an error during training.

    • 对于回归问题,标签列必须包含表示响应变量的数字数据。For regression problems, the label column must contain numeric data that represents the response variable. 理想情况下,数字数据表示连续标度。Ideally the numeric data represents a continuous scale.

    可能的示例有信用风险分数、硬盘驱动器的预计故障时间,或者在给定的日期或时间内对某个呼叫中心的呼叫预测数。Examples might be a credit risk score, the projected time to failure for a hard drive, or the forecasted number of calls to a call center on a given day or time. 如果未选择数字列,则可能会出现错误。If you do not choose a numeric column, you might get an error.

    • 如果未指定要使用的标签列,则 Azure 机器学习将尝试使用数据集的元数据推断哪个列是相应的标签列。If you do not specify which label column to use, Azure Machine Learning will try to infer which is the appropriate label column, by using the metadata of the dataset. 如果它选择了错误的列,请使用列选择器来更正它。If it picks the wrong column, use the column selector to correct it.


    如果使用列选择器时遇到问题,请参阅选择数据集中的列一文中的提示。If you have trouble using the Column Selector, see the article Select Columns in Dataset for tips. 该文章介绍了使用 WITH RULESBY NAME 选项的一些常见方案和提示。It describes some common scenarios and tips for using the WITH RULES and BY NAME options.

  5. 提交管道。Submit the pipeline. 如果有大量数据,则可能需要一段时间。If you have a lot of data, this can take a while.


在对模型进行训练后:After the model is trained:

  • 若要在其他管道中使用该模型,请选择该模块,然后在右侧面板的“输出” 选项卡下选择“注册数据集” 图标。To use the model in other pipelines, select the module and select the Register dataset icon under the Outputs tab in right panel. 可以在“数据集” 下的模块调色板中访问已保存的模型。You can access saved models in the module palette under Datasets.

  • 若要在预测新值时使用模型,请将其连接到评分模型模块以及新的输入数据。To use the model in predicting new values, connect it to the Score Model module, together with new input data.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.