多类提升决策树Multiclass Boosted Decision Tree

本文介绍 Azure 机器学习设计器(预览版)中的模块。This article describes a module in Azure Machine Learning designer (preview).

使用此模块,可以根据提升决策树算法创建机器学习模型。Use this module to create a machine learning model that is based on the boosted decision trees algorithm.

提升决策树是一种集成学习方法,在此方法中,第二个树将针对第一个树的误差进行纠正,第三个树将针对第一个和第二个树的误差进行纠正,依此类推。A boosted decision tree is an ensemble learning method in which the second tree corrects for the errors of the first tree, the third tree corrects for the errors of the first and second trees, and so forth. 预测基于树的集合。Predictions are based on the ensemble of trees together.

配置方式How to configure

此模块创建一个未训练的分类模型。This module creates an untrained classification model. 由于分类是一种监督式学习方法,所以,你需要一个标记的数据集,其中包含一个标签列,该列在所有行中都有一个值 。Because classification is a supervised learning method, you need a labeled dataset that includes a label column with a value for all rows.

可以使用训练模型来训练这种类型的模型。You can train this type of model by using the Train Model.

  1. 将“多类提升决策树”模块添加到管道 。Add the Multiclass Boosted Decision Tree module to your pipeline.

  2. 通过设置“创建训练程序模式”选项,指定所希望的模型训练方式 。Specify how you want the model to be trained by setting the Create trainer mode option.

    • 单个参数:如果你知道自己想要如何配置模型,可以提供一组特定的值作为参数。Single Parameter: If you know how you want to configure the model, you can provide a specific set of values as arguments.

    • 参数范围:如果不确定最佳参数并想要运行参数整理,请选择此选项。Parameter Range: Select this option if you are not sure of the best parameters, and want to run a parameter sweep. 选择要循环访问的值范围,优化模型超参数将循环访问所提供设置的所有可能组合,以确定产生最佳结果的超参数。Select a range of values to iterate over, and the Tune Model Hyperparameters iterates over all possible combinations of the settings you provided to determine the hyperparameters that produce the optimal results.

    • “每个树的最大叶数”限制可在任何树中创建的终端节点(叶)的最大数目。Maximum number of leaves per tree limits the maximum number of terminal nodes (leaves) that can be created in any tree.

      如果增大此值,则可能会增加树的大小并达到更高的精度,但会有过度拟合和训练时间较长的风险。By increasing this value, you potentially increase the size of the tree and achieve higher precision, at the risk of overfitting and longer training time.

    • “每个叶节点的最少样本数”指示在树中创建任何终端节点(叶)所需的事例数。Minimum number of samples per leaf node indicates the number of cases required to create any terminal node (leaf) in a tree.

      通过增加此值,可以增加用于创建新规则的阈值。By increasing this value, you increase the threshold for creating new rules. 例如,使用默认值 1 时,即使是单个事例也可以导致创建新规则。For example, with the default value of 1, even a single case can cause a new rule to be created. 如果将值增加到 5,则训练数据将必须包含至少五个满足相同条件的案例。If you increase the value to 5, the training data would have to contain at least five cases that meet the same conditions.

    • “学习速率”定义学习时的步幅。Learning rate defines the step size while learning. 请输入介于 0 到 1 之间的数字。Enter a number between 0 and 1.

      学习速率决定了学习器向最佳解决方案趋近的速度。The learning rate determines how fast or slow the learner converges on an optimal solution. 如果步幅太大,则可能超出最佳解决方案。If the step size is too large, you might overshoot the optimal solution. 如果步幅太小,训练将花费更长的时间来趋近最佳解决方案。If the step size is too small, training takes longer to converge on the best solution.

    • “构造的树数”指示要在集成中创建的决策树的总数。Number of trees constructed indicates the total number of decision trees to create in the ensemble. 通过创建更多决策树,你可能会获得更好的覆盖范围,但训练时间将会增加。By creating more decision trees, you can potentially get better coverage, but training time will increase.

    • “随机数种子”可以选择性地设置非负整数作为随机种子值。Random number seed optionally sets a non-negative integer to use as the random seed value. 指定种子可以确保具有相同数据和参数的运行之间的可再现性。Specifying a seed ensures reproducibility across runs that have the same data and parameters.

      默认情况下,随机种子设置为 42。The random seed is set by default to 42. 使用不同随机种子的后续运行会产生不同的结果。Successive runs using different random seeds can have different results.

  3. 训练模型:Train the model:

    • 如果将“创建训练程序模式”设置为“单个参数”,请连接带标记的数据集和训练模型模块 。If you set Create trainer mode to Single Parameter, connect a tagged dataset and the Train Model module.

    • 如果将“创建训练程序模式”设置为“参数范围”,请连接带标记的数据集并使用优化模型超参数来训练模型 。If you set Create trainer mode to Parameter Range, connect a tagged dataset and train the model by using Tune Model Hyperparameters.

    Note

    如果将参数范围传递给训练模型,则它只使用单个参数列表中的默认值。If you pass a parameter range to Train Model, it uses only the default value in the single parameter list.

    如果将一组参数值传递给优化模型超参数模块,则当它期望每个参数有一系列设置时,它会忽略这些值,并为学习器使用默认值。If you pass a single set of parameter values to the Tune Model Hyperparameters module, when it expects a range of settings for each parameter, it ignores the values, and uses the default values for the learner.

    如果选择“参数范围”选项并为任何参数输入单个值,则整个整理过程中都会使用你指定的单个值,即使其他参数的值发生一系列更改。If you select the Parameter Range option and enter a single value for any parameter, that single value you specified is used throughout the sweep, even if other parameters change across a range of values.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.