训练聚类模型Train Clustering Model

本文介绍 Azure 机器学习设计器(预览版)中的模块。This article describes a module in Azure Machine Learning designer (preview).

使用此模块来训练聚类分析模型。Use this module to train a clustering model.

此模块采用已使用 K-Means 聚类分析模块配置的未训练的聚类分析模型,并使用标记的或未标记的数据集来训练模型。The module takes an untrained clustering model that you have already configured using the K-Means Clustering module, and trains the model using a labeled or unlabeled data set. 此模块既创建可用于预测的已训练模型,又为训练数据中的每个案例创建一组群集分配。The module creates both a trained model that you can use for prediction, and a set of cluster assignments for each case in the training data.

备注

聚类分析模型不能使用训练模型模块(用于训练机器学习模型的通用模块)进行训练,A clustering model cannot be trained using the Train Model module, which is the generic module for training machine learning models. 因为训练模型仅适用于监督式学习算法。That is because Train Model works only with supervised learning algorithms. K-means 和其他聚类分析算法允许非监督式学习,这意味着算法可以从非标记的数据进行学习。K-means and other clustering algorithms allow unsupervised learning, meaning that the algorithm can learn from unlabeled data.

如何使用“训练聚类分析模型”How to use Train Clustering Model

  1. 在设计器中向管道添加“训练聚类分析模型”模块 。Add the Train Clustering Model module to your pipeline in the designer. 可以在“机器学习模块”下的“训练”类别中找到此模块 。You can find the module under Machine Learning Modules, in the Train category.

  2. 添加 K-Means 聚类分析模块,或另一个可创建兼容的聚类分析模型的自定义模块,并设置聚类分析模型的参数。Add the K-Means Clustering module, or another custom module that creates a compatible clustering model, and set the parameters of the clustering model.

  3. 将训练数据集附加到训练聚类分析模型的右侧输入。Attach a training dataset to the right-hand input of Train Clustering Model.

  4. 在“列集”中,从数据集选择用于构建群集的列。 In Column Set, select the columns from the dataset to use in building clusters. 确保选择可以生成良好特征的列:例如,避免使用有唯一值的 ID 或其他列,或者所有值都相同的列。Be sure to select columns that make good features: for example, avoid using IDs or other columns that have unique values, or columns that have all the same values.

    如果某个标签可用,则可将它用作特征,或者不用它。If a label is available, you can either use it as a feature, or leave it out.

  5. 如果需要输出训练数据和新的群集标签,请选择“选中以便进行追加,或者取消选中以便只获取结果”选项。 Select the option, Check for append or uncheck for result only, if you want to output the training data together with the new cluster label.

    如果取消选择此选项,则只输出群集分配。If you deselect this option, only the cluster assignments are output.

  6. 提交管道,或单击“训练聚类分析模型”模块,然后选择“运行选中项” 。Submit the pipeline, or click the Train Clustering Model module and select Run Selected.

结果Results

在训练完成后:After training has completed:

  • 若要保存已训练模型的快照,请选择“训练模型” 模块右侧面板中的“输出” 选项卡。To save a snapshot of the trained model, select the Outputs tab in the right panel of the Train model module. 选择“注册数据集” 图标将模型保存为可重用模块。Select the Register dataset icon to save the model as a reusable module.

  • 若要从模型生成分数,请使用将数据分配到群集To generate scores from the model, use Assign Data to Clusters.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.