“添加列”模块Add Columns module

本文介绍 Azure 机器学习设计器中的一个模块。This article describes a module in Azure Machine Learning designer.

使用此模块连接两个数据集。Use this module to concatenate two datasets. 将指定为输入的两个数据集中的所有列合并,以便创建一个数据集。You combine all columns from the two datasets that you specify as inputs to create a single dataset. 如果需要连接两个以上的数据集,请使用“添加列”的多个实例。 If you need to concatenate more than two datasets, use several instances of Add Columns.

如何配置“添加列”How to configure Add Columns

  1. 将“添加列”模块添加到管道。 Add the Add Columns module to your pipeline.

  2. 连接要连接的两个数据集。Connect the two datasets that you want to concatenate. 如果要合并两个以上的数据集,可以将多个“添加列”的组合链接起来。 If you want to combine more than two datasets, you can chain together several combinations of Add Columns.

    • 可以合并具有不同行数的两个列。It is possible to combine two columns that have a different number of rows. 对于较小的源列中的每一行,输出数据集中会填充缺失值。The output dataset is padded with missing values for each row in the smaller source column.

    • 不能选择要添加的单个列。You cannot choose individual columns to add. 使用“添加列”时,将连接每个数据集中的所有列。 All the columns from each dataset are concatenated when you use Add Columns. 因此,如果只想添加列的一个子集,请使用“选择数据集中的列”来创建包含所需列的数据集。Therefore, if you want to add only a subset of the columns, use Select Columns in Dataset to create a dataset with the columns you want.

  3. 提交管道。Submit the pipeline.

结果Results

管道运行后:After the pipeline has run:

  • 若要查看新数据集的头几行,请右键单击“添加列”模块,然后选择“可视化”。 To see the first rows of the new dataset, right-click the Add Columns module and select Visualize. 或者选择模块并切换到右侧面板中的“输出” 选项卡,单击 端口输出 中的直方图图标以直观显示结果。Or Select the module and switch to the Outputs tab in the right panel, click on the histogram icon in the Port outputs to visualize the result.

新数据集中的列数等于两个输入数据集的列数之和。The number of columns in the new dataset equals the sum of the columns of both input datasets.

如果输入数据集中存在两个名称相同的列,则会向该列的名称添加一个数字后缀。If there are two columns with the same name in the input datasets, a numeric suffix is added to the name of the column. 例如,如果有两个名为 TargetOutcome 的列的实例,则左列将重命名为 TargetOutcome_1,右列将重命名为 TargetOutcome_2。For example, if there are two instances of a column named TargetOutcome, the left column would be renamed TargetOutcome_1 and the right column would be renamed TargetOutcome_2.

后续步骤Next steps

请参阅 Azure 机器学习的可用模块集See the set of modules available to Azure Machine Learning.