模块:将数据分配到聚类Module: Assign Data to Clusters

本文介绍如何使用 Azure 机器学习设计器(预览版)中的“将数据分配到群集”模块。 This article describes how to use the Assign Data to Clusters module in Azure Machine Learning designer (preview). 该模块通过一个使用“K-means 聚类分析” 算法训练的聚类分析模型来生成预测。The module generates predictions through a clustering model that was trained with the K-means clustering algorithm.

“将数据分配到群集”模块会返回一个数据集,其中包含每个新数据点的可能分配。The Assign Data to Clusters module returns a dataset that contains the probable assignments for each new data point.

如何使用“将数据分配到群集”How to use Assign Data to Clusters

  1. 在 Azure 机器学习设计器中,找到先前训练的聚类分析模型。In Azure Machine Learning designer, locate a previously trained clustering model. 可以使用以下方法之一创建和训练聚类分析模型:You can create and train a clustering model by using either of the following methods:

    • 使用 K-Means 聚类分析模块配置 K-Means 聚类分析算法,并使用数据集和“训练聚类分析模型”模块(本文)来训练模型。Configure the K-means clustering algorithm by using the K-Means Clustering module, and train the model by using a dataset and the Train Clustering Model module (this article).

    • 还可以从工作区中的“保存的模型” 组添加现有的已训练聚类分析模型。You can also add an existing trained clustering model from the Saved Models group in your workspace.

  2. 将训练的模型附加到“将数据分配到群集”的左侧输入端口。 Attach the trained model to the left input port of Assign Data to Clusters.

  3. 将新的数据集作为输入附加。Attach a new dataset as input.

    在此数据集中,标签为可选。In this dataset, labels are optional. 通常情况下,聚类分析是一种无人监督的学习方法。Generally, clustering is an unsupervised learning method. 你不会提前知道这些类别。You are not expected to know the categories in advance. 但是,输入列必须与在训练聚类分析模型时使用的列相同,否则会发生错误。However, the input columns must be the same as the columns that were used in training the clustering model, or an error occurs.

    提示

    若要减少从群集预测写入设计器的列数,请使用选择数据集中的列,然后选择列的子集。To reduce the number of columns that are written to the designer from the cluster predictions, use Select columns in the dataset, and select a subset of the columns.

  4. 如果希望结果包含完整的输入数据集(包括显示结果的列(群集分配)),请让“选中以便进行追加,或者取消选中以便只获取结果”复选框保持选中状态。 Leave the Check for append or uncheck for result only check box selected if you want the results to contain the full input dataset, including a column that displays the results (cluster assignments).

    如果清除此复选框,则仅返回结果。If you clear this check box, only the results are returned. 将预测作为 Web 服务的一部分创建时,可以使用此选项。This option might be useful when you create predictions as part of a web service.

  5. 提交管道。Submit the pipeline.

结果Results

  • 若要查看数据集中的值,请右键单击模块,然后选择“可视化” 。To view the values in the dataset, right-click the module, and then select Visualize. 或者选择模块并切换到右侧面板中的“输出” 选项卡,单击端口输出中的直方图图标以直观显示结果。Or Select the module and switch to the Outputs tab in the right panel, click on the histogram icon in the Port outputs to visualize the result.