“一对多”多类分类One-vs-All Multiclass

本文介绍如何使用 Azure 机器学习设计器中的“一对多”多类分类模块。This article describes how to use the One-vs-All Multiclass module in Azure Machine Learning designer. 目的是创建一个分类模型,该模型可以使用“一对多”方法对多类分类进行预测 。The goal is to create a classification model that can predict multiple classes, by using the one-versus-all approach.

当结果依赖于连续的或分类的预测器变量时,此模块可用于创建预测三个或更多可能结果的模型。This module is useful for creating models that predict three or more possible outcomes, when the outcome depends on continuous or categorical predictor variables. 使用此方法还能对需要多个输出类的问题使用二元分类方法。This method also lets you use binary classification methods for issues that require multiple output classes.

有关一对多模型的详细信息More about one-versus-all models

某些分类算法允许在设计中使用两个以上的类。Some classification algorithms permit the use of more than two classes by design. 其他分类算法会将可能的结果限制为两个值中的一个(二进制或双类模型)。Others restrict the possible outcomes to one of two values (a binary, or two-class model). 但即使是二元分类算法也可以通过各种策略应用于多类分类任务。But even binary classification algorithms can be adapted for multi-class classification tasks through a variety of strategies.

此模块实现一对多方法,此方法为每个输出类创建一个二元模型。This module implements the one-versus-all method, in which a binary model is created for each of the multiple output classes. 该模块针对各个类的补集(模型中的所有其他类)评估各个类的二元模型,就像处理二元分类一样。The module assesses each of these binary models for the individual classes against its complement (all other classes in the model) as though it's a binary classification issue. 除了它的计算效率(只需要 n_classes 分类器)外,这种方法的一个优点是它的可解释性。In addition to its computational efficiency (only n_classes classifiers are needed), one advantage of this approach is its interpretability. 由于每个类仅由一个分类器表示,因此可以通过检查其对应的分类器来获取该类的知识。Since each class is represented by one and one classifier only, it is possible to gain knowledge about the class by inspecting its corresponding classifier. 这是多类分类最常用的策略,也是一个合理的默认选择。This is the most commonly used strategy for multiclass classification and is a fair default choice. 然后该模块会执行预测,运行这些二元分类器并选择可信度最高的预测结果。The module then performs prediction by running these binary classifiers and choosing the prediction with the highest confidence score.

从本质上讲,该模块创建了一组个体模型,然后合并结果,从而创建预测所有类的单一模型。In essence, the module creates an ensemble of individual models and then merges the results, to create a single model that predicts all classes. 任何二元分类器均可用作一对多模型的基础。Any binary classifier can be used as the basis for a one-versus-all model.

例如,假设你配置了一个双类支持向量机模型,并将其作为输入提供给“一对多”多类分类模块。For example, let’s say you configure a Two-Class Support Vector Machine model and provide that as input to the One-vs-All Multiclass module. 该模块将为输出类的所有成员创建双类支持向量机模型。The module would create two-class support vector machine models for all members of the output class. 然后它将应用一对多方法来合并所有类的结果。It would then apply the one-versus-all method to combine the results for all classes.

本模块使用 sklearn 的 OneVsRestClassifier,可在此处了解更多详细信息。The module uses OneVsRestClassifier of sklearn, and you can learn more details here.

如何配置“一对多”多类分类分类器How to configure the One-vs-All Multiclass classifier

此模块创建一组二元分类模型以分析多个类。This module creates an ensemble of binary classification models to analyze multiple classes. 若要使用此模块,需要先配置并训练二元分类模型 。To use this module, you need to configure and train a binary classification model first.

将二元模型连接到“一对多”多类分类模块。You connect the binary model to the One-vs-All Multiclass module. 然后通过使用带标记的训练数据集的训练模型来训练这组模型。You then train the ensemble of models by using Train Model with a labeled training dataset.

组合模型时,“一对多”多类分类会创建多个二元分类模型,为每个类优化算法,然后合并这些模型。When you combine the models, One-vs-All Multiclass creates multiple binary classification models, optimizes the algorithm for each class, and then merges the models. 虽然训练数据集可以有多个类值,但模块还是会执行这些任务。The module does these tasks even though the training dataset might have multiple class values.

  1. 在设计器中将“一对多”多类分类模块添加到管道。Add the One-vs-All Multiclass module to your pipeline in the designer. 可以在“机器学习 - 初始化”下的“分类”类别中找到此模块 。You can find this module under Machine Learning - Initialize, in the Classification category.

    “一对多”多类分类分类器没有自己的可配置参数。The One-vs-All Multiclass classifier has no configurable parameters of its own. 任何自定义操作都必须在作为输入提供的二元分类模型中完成。Any customizations must be done in the binary classification model that's provided as input.

  2. 将二元分类模型添加到管道,并配置该模型。Add a binary classification model to the pipeline, and configure that model. 例如,可以使用双类支持向量机双类提升决策树For example, you might use Two-Class Support Vector Machine or Two-Class Boosted Decision Tree.

  3. 训练模型模块添加到管道。Add the Train Model module to your pipeline. 连接作为“一对多”多类分类的输出的未训练分类器。Connect the untrained classifier that is the output of One-vs-All Multiclass.

  4. 训练模型的其他输入上,连接包含多个类值的带标记的训练数据集。On the other input of Train Model, connect a labeled training dataset that has multiple class values.

  5. 提交管道。Submit the pipeline.

结果Results

训练完成后,可以使用模型进行多类分类预测。After training is complete, you can use the model to make multiclass predictions.

另外,还可以将未训练的分类器传递给交叉验证模型,以针对带标记的验证数据集进行交叉验证。Alternatively, you can pass the untrained classifier to Cross-Validate Model for cross-validation against a labeled validation dataset.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.