“双类支持向量机”模块Two-Class Support Vector Machine module

本文介绍了 Azure 机器学习设计器(预览版)中的一个模块。This article describes a module in Azure Machine Learning designer (preview).

使用此模块,可以基于支持向量机算法创建模型。Use this module to create a model that is based on the support vector machine algorithm.

支持向量机 (SVM) 是一种研究深入的监督式学习方法。Support vector machines (SVMs) are a well-researched class of supervised learning methods. 此特定实现适合于基于连续或分类变量预测两个可能结果。This particular implementation is suited to prediction of two possible outcomes, based on either continuous or categorical variables.

在定义模型参数后,通过使用训练模块并提供“带标记的数据集” (其中包括一个标签或结果列)来训练模型。After defining the model parameters, train the model by using the training modules, and providing a tagged dataset that includes a label or outcome column.

关于支持向量机About support vector machines

支持向量机是最早的机器学习算法之一,并且 SVM 模型在信息检索、文本和图像分类等领域有着广泛的应用。Support vector machines are among the earliest of machine learning algorithms, and SVM models have been used in many applications, from information retrieval to text and image classification. SVM 可用于分类和回归任务。SVMs can be used for both classification and regression tasks.

此 SVM 模型是一种监督式学习模型,它需要带标签的数据。This SVM model is a supervised learning model that requires labeled data. 在训练过程中,算法会分析输入数据并在称为“超平面”的多维特性空间中识别模式。 In the training process, the algorithm analyzes input data and recognizes patterns in a multi-dimensional feature space called the hyperplane. 所有输入示例都表示为该空间中的点,并映射到输出类别,以便通过尽可能宽和清晰的间隔来分割类别。All input examples are represented as points in this space, and are mapped to output categories in such a way that categories are divided by as wide and clear a gap as possible.

对于预测,SVM 算法会将新示例分配到一个或另一个类别中,并将它们映射到该同一空间。For prediction, the SVM algorithm assigns new examples into one category or the other, mapping them into that same space.

配置方式How to configure

对于此模型类型,建议在使用数据集来训练分类器之前对数据集进行标准化。For this model type, it is recommended that you normalize the dataset before using it to train the classifier.

  1. 双类支持向量机模块添加到你的管道。Add the Two-Class Support Vector Machine module to your pipeline.

  2. 通过设置“创建训练程序模式”选项,指定要如何对模型进行训练。Specify how you want the model to be trained, by setting the Create trainer mode option.

    • “单个参数”:如果你知道自己想要如何配置模型,可以提供一组特定的值作为参数。Single Parameter: If you know how you want to configure the model, you can provide a specific set of values as arguments.

    • 参数范围:如果不确定最佳参数,可以使用优化模型超参数模块来找到最佳参数。Parameter Range: If you are not sure of the best parameters, you can find the optimal parameters by using the Tune Model Hyperparameters module. 你提供一定的值范围,然后训练程序会循环访问设置的多个组合,以确定可产生最佳结果的值组合。You provide some range of values, and the trainer iterates over multiple combinations of the settings to determine the combination of values that produces the best result.

  3. 对于“迭代数”,请键入一个数字来指定在构建模型时使用的迭代数目。For Number of iterations, type a number that denotes the number of iterations used when building the model.

    此参数可用来控制训练速度与准确度之间的权衡。This parameter can be used to control trade-off between training speed and accuracy.

  4. 对于“Lambda”,请键入一个值来用作 L1 正则化的权重。For Lambda, type a value to use as the weight for L1 regularization.

    可以使用此正则化系数来优化模型。This regularization coefficient can be used to tune the model. 值越大,对越复杂的模型越不利。Larger values penalize more complex models.

  5. 如果要在训练之前对特性进行标准化,请选择“标准化特性”选项。Select the option, Normalize features, if you want to normalize features before training.

    如果在训练之前应用了标准化,则数据点将以平均值为中心,并缩放为有一个单位的标准偏差。If you apply normalization, before training, data points are centered at the mean and scaled to have one unit of standard deviation.

  6. 选择“投影到单位球体”选项可以对系数进行标准化。Select the option, Project to the unit sphere, to normalize coefficients.

    将值投影到单位空间意味着,在训练之前,数据点将以 0 为中心,并缩放为有一个单位的标准偏差。Projecting values to unit space means that before training, data points are centered at 0 and scaled to have one unit of standard deviation.

  7. 如果希望确保在各次运行之间的可再现性,请在“随机数种子”中键入一个整数值来用作种子。In Random number seed, type an integer value to use as a seed if you want to ensure reproducibility across runs. 否则,将使用系统时钟值作为种子,这可能会导致结果在各次运行之间稍有不同。Otherwise, a system clock value is used as a seed, which can result in slightly different results across runs.

  8. 连接标记的数据集,并训练模型:Connect a labeled dataset, and train the model:

    • 如果将“创建训练程序模式”设置为“单个参数”,请连接带标记的数据集和训练模型模块 。If you set Create trainer mode to Single Parameter, connect a tagged dataset and the Train Model module.

    • 如果将“创建训练程序模式”设置为“参数范围”,请连接带标记的数据集并使用优化模型超参数来训练模型 。If you set Create trainer mode to Parameter Range, connect a tagged dataset and train the model by using Tune Model Hyperparameters.

    备注

    如果将参数范围传递给训练模型,则它只使用单个参数列表中的默认值。If you pass a parameter range to Train Model, it uses only the default value in the single parameter list.

    如果将一组参数值传递给优化模型超参数模块,则当它期望每个参数有一系列设置时,它会忽略这些值,并为学习器使用默认值。If you pass a single set of parameter values to the Tune Model Hyperparameters module, when it expects a range of settings for each parameter, it ignores the values, and uses the default values for the learner.

    如果选择“参数范围”选项并为任何参数输入单个值,则整个整理过程中都会使用你指定的单个值,即使其他参数的值发生一系列更改。If you select the Parameter Range option and enter a single value for any parameter, that single value you specified is used throughout the sweep, even if other parameters change across a range of values.

  9. 提交管道。Submit the pipeline.

结果Results

在训练完成后:After training is complete:

  • 若要保存已训练模型的快照,请选择“训练模型”模块右侧面板中的“输出”选项卡。To save a snapshot of the trained model, select the Outputs tab in the right panel of the Train model module. 选择“注册数据集”图标将模型保存为可重用模块。Select the Register dataset icon to save the model as a reusable module.

  • 若要使用模型进行评分,请向管道中添加评分模型模块。To use the model for scoring, add the Score Model module to a pipeline.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.