多类逻辑回归模块Multiclass Logistic Regression module

本文介绍 Azure 机器学习设计器中的一个模块。This article describes a module in Azure Machine Learning designer.

使用此模块创建可用于预测多个值的逻辑回归模型。Use this module to create a logistic regression model that can be used to predict multiple values.

使用逻辑回归的分类方法是一种监督式学习方法,因此需要经过标记的数据集。Classification using logistic regression is a supervised learning method, and therefore requires a labeled dataset. 可以通过提供模型和带标记的数据集作为模块(如训练模型)的输入,来对模型进行训练。You train the model by providing the model and the labeled dataset as an input to a module such as Train Model. 然后即可使用训练的模型来预测新输入示例的值。The trained model can then be used to predict values for new input examples.

Azure 机器学习还提供了两类逻辑回归模块,该模块适用于对二进制或二分变量进行分类。Azure Machine Learning also provides a Two-Class Logistic Regression module, which is suited for classification of binary or dichotomous variables.

关于多类逻辑回归About multiclass logistic regression

逻辑回归是统计学中著名的用于预测结果概率的方法,是分类任务的常用方法。Logistic regression is a well-known method in statistics that is used to predict the probability of an outcome, and is popular for classification tasks. 该算法通过将数据拟合到逻辑函数来预测事件发生的概率。The algorithm predicts the probability of occurrence of an event by fitting data to a logistic function.

在多类逻辑回归中,分类器可用于预测多个结果。In multiclass logistic regression, the classifier can be used to predict multiple outcomes.

配置多类逻辑回归Configure a multiclass logistic regression

  1. 将“多类逻辑回归”模块添加到管道。Add the Multiclass Logistic Regression module to the pipeline.

  2. 通过设置“创建训练程序模式”选项,指定所希望的模型训练方式。Specify how you want the model to be trained, by setting the Create trainer mode option.

    • 单个参数 :如果知道自己想要如何配置模型,请使用此选项并提供一组特定的值作为参数。Single Parameter : Use this option if you know how you want to configure the model, and provide a specific set of values as arguments.

    • 参数范围 :如果无法确定最佳参数并想要运行参数扫描,请选择此选项。Parameter Range : Select this option if you are not sure of the best parameters, and want to run a parameter sweep. 选择要循环访问的值范围,优化模型超参数模块将循环访问所提供的设置的所有可能组合,以确定产生最佳结果的超参数。Select a range of values to iterate over, and the Tune Model Hyperparameters iterates over all possible combinations of the settings you provided to determine the hyperparameters that produce the optimal results.

  3. 优化容差,指定优化器收敛的阈值。Optimization tolerance , specify the threshold value for optimizer convergence. 如果迭代间的改进小于阈值,则算法将停止并返回当前模型。If the improvement between iterations is less than the threshold, the algorithm stops and returns the current model.

  4. L1 正则化权重L2 正则化权重 :键入要用于正则化参数 L1 和 L2 的值。L1 regularization weight , L2 regularization weight : Type a value to use for the regularization parameters L1 and L2. 对于这两个值,建议使用非零值。A non-zero value is recommended for both.

    正则化是一种通过处罚具有极端系数值的模型来防止过度拟合的方法。Regularization is a method for preventing overfitting by penalizing models with extreme coefficient values. 正则化的工作原理是将与系数值相关联的处罚添加到假设的错误。Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. 具有极端系数值的准确模型受到的处罚相较而言更大,而值更保守的不准确的模型受到的处罚相较而言更小。An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.

    L1 和 L2 正则化具有不同的效果和用途。L1 and L2 regularization have different effects and uses. L1 可用于稀疏模型,这在处理高维数据时非常有用。L1 can be applied to sparse models, which is useful when working with high-dimensional data. 与此相反,L2 正则化更适合用于非稀疏数据。In contrast, L2 regularization is preferable for data that is not sparse. 此算法支持 L1 和 L2 正则化值的线性组合:也就是说,如果 x = L1y = L2,则 ax + by = c 定义正则化术语的线性跨度。This algorithm supports a linear combination of L1 and L2 regularization values: that is, if x = L1 and y = L2, ax + by = c defines the linear span of the regularization terms.

    已为逻辑回归模型设计了 L1 和 L2 术语的不同线性组合,例如弹性网络正则化Different linear combinations of L1 and L2 terms have been devised for logistic regression models, such as elastic net regularization.

  5. 随机数种子 :如果希望结果在运行期间是可重复的,请键入一个整数值作为算法的种子。Random number seed : Type an integer value to use as the seed for the algorithm if you want the results to be repeatable over runs. 否则,将使用系统时钟值作为种子,这可能会在同一管道的运行中产生略微不同的结果。Otherwise, a system clock value is used as the seed, which can produce slightly different results in runs of the same pipeline.

  6. 连接标记的数据集,并训练模型:Connect a labeled dataset, and train the model:

    • 如果将“创建训练程序模式”设置为“单个参数”,请连接带标记的数据集和训练模型模块 。If you set Create trainer mode to Single Parameter , connect a tagged dataset and the Train Model module.

    • 如果将“创建训练程序模式”设置为“参数范围”,则连接标记的数据集并使用优化模型超参数模块来训练模型 。If you set Create trainer mode to Parameter Range , connect a tagged dataset and train the model by using Tune Model Hyperparameters.

    备注

    如果将参数范围传递给训练模型模块,则它只使用单个参数列表中的默认值。If you pass a parameter range to Train Model, it uses only the default value in the single parameter list.

    如果将一组参数值传递给优化模型超参数模块,则当它期望每个参数有一系列设置时,它会忽略这些值,并为学习器使用默认值。If you pass a single set of parameter values to the Tune Model Hyperparameters module, when it expects a range of settings for each parameter, it ignores the values, and uses the default values for the learner.

    如果选择“参数范围”选项并为任何参数输入单个值,则整个扫描过程中都会使用你指定的单个值,即使其他参数的一系列值发生了更改。If you select the Parameter Range option and enter a single value for any parameter, that single value you specified is used throughout the sweep, even if other parameters change across a range of values.

  7. 提交管道。Submit the pipeline.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.