"在数据集中选择列"模块Select Columns in Dataset module

本文介绍 Azure 机器学习设计器中的一个模块。This article describes a module in Azure Machine Learning designer.

使用此模块选择列的子集以在下游操作中使用。Use this module to choose a subset of columns to use in downstream operations. 该模块并非从物理角度删除源数据集中的列,而是创建列的子集,这与数据库视图和投影极其相似 。The module does not physically remove the columns from the source dataset; instead, it creates a subset of columns, much like a database view or projection .

如果需要限制可用于下游操作的列,或者希望通过删除不需要的列来减小数据集的大小,便可参考此模块。This module is useful when you need to limit the columns available for a downstream operation, or if you want to reduce the size of the dataset by removing unneeded columns.

数据集中的列的输出顺序和原始数据中的相同,即使你以不同的顺序指定它们,也是如此。The columns in the dataset are output in the same order as in the original data, even if you specify them in a different order.

如何使用How to use

此模块没有参数。This module has no parameters. 可使用列选择器选择要包含或排除的列。You use the column selector to choose the columns to include or exclude.

按名称选择列Choose columns by name

此模块介绍了按名称选择列的多个选项:There are multiple options in the module for choosing columns by name:

  • 筛选和搜索Filter and search

    单击“按名称”选项 。Click the BY NAME option.

    如果连接的数据集已填充,则应显示可用列的列表。If you have connected a dataset that is already populated, a list of available columns should appear. 如果未显示列,则可能需要运行上游模块以查看列的列表。If no columns appear, you might need to run upstream modules to view the column list.

    请在搜索框中键入内容以筛选列表。To filter the list, type in the search box. 例如,如果在搜索框中键入字母 w,则会对列表进行筛选,以显示包含字母 w 的列名称。For example, if you type the letter w in the search box, the list is filtered to show the column names that contain the letter w.

    选择这些列,然后单击右箭头按钮,将选中的列移到右窗格中的列表中。Select columns and click the right arrow button to move the selected columns to the list in the right-hand pane.

    • 若要选择列名称的连续范围,请按住 Shift 并单击 。To select a continuous range of column names, press Shift + Click .
    • 若要将单独的列添加到所选内容,请按住 Ctrl 并单击 。To add individual columns to the selection, press Ctrl + Click .

    单击复选标记按钮,保存并关闭。Click the checkmark button to save and close.

  • 结合其他规则使用名称Use names in combination with other rules

    单击“按规则”选项 。Click the WITH RULES option.

    选择一种规则,如显示特定数据类型的列。Choose a rule, such as showing columns of a specific data type.

    然后,单击该类型的单个列的名称,将其添加到所选内容列表。Then, click individual columns of that type by name, to add them to the selection list.

  • 键入或粘贴用逗号分隔的列名称列表Type or paste a comma-separated list of column names

    如果数据集范围宽,则使用索引或生成的名称列表可能更简单,而不是单独选择列。If your dataset is wide, it might be easier to use indexes or generated lists of names, rather than selecting columns individually. 假设你提前准备好了列表:Assuming you have prepared the list in advance:

    1. 单击“按规则”选项 。Click the WITH RULES option.
    2. 选择“没有列”,再选择“包括”,然后在文本框中单击红色感叹号 。Select No columns , select Include , and then click inside the text box with the red exclamation mark.
    3. 粘贴或键入之前确认的以逗号分隔的列名称列表。Paste in or type a comma-separated list of previously validated column names. 如果有任何列存在无效名称,则无法保存模块,因此请事先检查名称。You cannot save the module if any column has an invalid name, so be sure to check the names beforehand.

    你也可以使用此方法来指定使用其索引值的列的列表。You can also use this method to specify a list of columns using their index values.

按类型选择Choose by type

如果使用“按规则”选项,则可以对列所选内容应用多个条件 。If you use the WITH RULES option, you can apply multiple conditions on the column selections. 例如,你可能只需要获取数值数据类型的功能列。For example, you might need to get only feature columns of a numeric data type.

可通过“开头为”选项确定你的起点,该选项对于了解结果来说很重要 。The BEGIN WITH option determines your starting point and is important for understanding the results.

  • 如果选择“所有列”选项,则所有列都将添加到列表中 。If you select the ALL COLUMNS option, all columns are added to the list. 然后,必须使用“排除”选项删除满足特定条件的列 。Then, you must use the Exclude option to remove columns that meet certain conditions.

    例如,可以从所有列开始,然后按名称或按类型删除列。For example, you might start with all columns and then remove columns by name, or by type.

  • 如果选择“没有列”选择,则列的列表为空 。If you select the NO COLUMNS option, the list of columns starts out empty. 然后指定条件以向列表添加列 。You then specify conditions to add columns to the list.

    如果应用多个规则,则每个条件都是累加的 。If you apply multiple rules, each condition is additive . 例如,假设从没有列开始,然后添加一个规则来获取所有数字列。For example, say you start with no columns, and then add a rule to get all numeric columns. 在汽车价格数据集中生成 16 列。In the Automobile price dataset, that results in 16 columns. 然后,单击“+”符号以添加新条件,再选择“包含所有功能” 。Then, you click the + sign to add a new condition, and select Include all features . 生成的数据集包括所有数字列以及所有功能列,并包含一些字符串功能列。The resulting dataset includes all the numeric columns, plus all the feature columns, including some string feature columns.

按列索引选择Choose by column index

列索引是指原始数据集中的列顺序。The column index refers to the order of the column within the original dataset.

  • 列从 1 开始按顺序编号。Columns are numbered sequentially starting at 1.
  • 若要获取列的范围,请使用连字符。To get a range of columns, use a hyphen.
  • 不允许使用开放式的规范,如 1--3Open-ended specifications such as 1- or -3 are not allowed.
  • 不允许使用重复的索引值(或列名称),否则可能导致错误。Duplicate index values (or column names) are not allowed, and might result in an error.

例如,假设你的数据集至少有八列,则可以粘贴以下任意示例以返回多个非相邻的列:For example, assuming your dataset has at least eight columns, you could paste in any of the following examples to return multiple non-contiguous columns:

  • 8,1-4,6
  • 1,3-8
  • 1,3-6,4

最后的示例不会导致错误;但是它将返回列的单个实例 4the final example does not result in an error; however, it returns a single instance of column 4.

更改列的顺序Change order of columns

选项“允许重复项并在所选内容中保留列顺序”从空列表开始,然后添加按名称或索引指定的列 。The option Allow duplicates and preserve column order in selection starts with an empty list, and adds columns that you specify by name or by index. 与其他选项不同,它们始终按“自然顺序”返回列,而此选项按你指定或列出的列顺序输出列。Unlike other options, which always return columns in their "natural order", this option outputs the columns in the order that you name or list them.

例如,在包含列 Col1、Col2、Col3 和 Col4 的数据集中,可以通过指定以下列表之一来反转列的顺序并排除列 2:For example, in a dataset with the columns Col1, Col2, Col3, and Col4, you could reverse the order of the columns and leave out column 2, by specifying either of the following lists:

  • Col4, Col3, Col1
  • 4,3,1

