映射数据流中的 select 转换Select transformation in mapping data flow

适用于: Azure 数据工厂 Azure Synapse Analytics

使用 select 转换对列执行重命名、删除或重新排序操作。Use the select transformation to rename, drop, or reorder columns. 此转换不会更改行数据,但会选择哪些列是在下游传播的。This transformation doesn't alter row data, but chooses which columns are propagated downstream.

在 select 转换中,用户可指定固定映射、使用模式来执行基于规则的映射,也可启用自动映射。In a select transformation, users can specify fixed mappings, use patterns to do rule-based mapping, or enable auto mapping. 可在同一 select 转换中同时使用固定映射和基于规则的映射。Fixed and rule-based mappings can both be used within the same select transformation. 如果某个列与定义的某个映射不匹配,则该列将被删除。If a column doesn't match one of the defined mappings, it will be dropped.

固定映射Fixed mapping

如果投影中定义的列少于 50,则默认情况下,所有定义的列都将具有固定映射。If there are fewer than 50 columns defined in your projection, all defined columns will have a fixed mapping by default. 固定映射采用已定义的传入列,并将其映射为精确名称。A fixed mapping takes a defined, incoming column and maps it an exact name.

固定映射Fixed mapping

备注

不能使用固定映射来映射或重命名偏移列You can't map or rename a drifted column using a fixed mapping

映射分层列Mapping hierarchical columns

固定映射可用于将分层列的子列映射到顶级列。Fixed mappings can be used to map a subcolumn of a hierarchical column to a top-level column. 如果具有已定义的层次结构,请使用列下拉列表选择子列。If you have a defined hierarchy, use the column dropdown to select a subcolumn. select 转换将创建一个包含子列的值和数据类型的新列。The select transformation will create a new column with the value and data type of the subcolumn.

分层映射hierarchical mapping

基于规则的映射Rule-based mapping

如果希望一次映射多个列或将偏移列传递到下游,请使用基于规则的映射来定义使用列模式的映射。If you wish to map many columns at once or pass drifted columns downstream, use rule-based mapping to define your mappings using column patterns. 基于列的 nametypestreamposition 进行匹配。Match based on the name, type, stream, and position of columns. 可使用固定映射和基于规则的映射的任意组合。You can have any combination of fixed and rule-based mappings. 默认情况下,列数大于 50 的投影都将默认为基于规则的映射,其在每列上进行匹配并输出所输入的名称。By default, all projections with greater than 50 columns will default to a rule-based mapping that matches on every column and outputs the inputted name.

若要添加基于规则的映射,请单击“添加映射”,然后选择“基于规则的映射” 。To add a rule-based mapping, click Add mapping and select Rule-based mapping.

屏幕截图显示从“添加映射”中选择的“基于规则的映射”。Screenshot shows Rule-based mapping selected from Add mapping.

每个基于规则的映射需要两个输入:要匹配的条件以及每个映射列的名称。Each rule-based mapping requires two inputs: the condition on which to match by and what to name each mapped column. 这两个值都是通过表达式生成器输入的。Both values are inputted via the expression builder. 在左侧表达式框中,输入布尔值匹配条件。In the left expression box, enter your boolean match condition. 在右侧的表达式框中,指定要将匹配的列映射到的项。In the right expression box, specify what the matched column will be mapped to.

屏幕截图显示映射。Screenshot shows a mapping.

使用 $$ 语法来引用匹配列的输入名称。Use $$ syntax to reference the input name of a matched column. 以上图为例,假设用户想要在名称长度不到 6 个字符的所有字符串列上进行匹配。Using the above image as an example, say a user wants to match on all string columns whose names are shorter than six characters. 如果一个传入列被命名为 test,则表达式 $$ + '_short' 会将列重命名为 test_shortIf one incoming column was named test, the expression $$ + '_short' will rename the column test_short. 如果这是存在的唯一映射,则所有不符合该条件的列都将从输出的数据中删除。If that's the only mapping that exists, all columns that don't meet the condition will be dropped from the outputted data.

模式同时匹配偏移列和已定义的列。Patterns match both drifted and defined columns. 若要查看由规则映射的已定义的列,请单击规则旁边的眼镜图标。To see which defined columns are mapped by a rule, click the eyeglasses icon next to the rule. 使用数据预览来验证输出。Verify your output using data preview.

正则表达式映射Regex mapping

如果单击向下的 V 形图标,可指定正则表达式映射条件。If you click the downward chevron icon, you can specify a regex-mapping condition. 对于与指定的正则表达式条件匹配的列名称,正则表达式映射条件与所有这些名称全部匹配。A regex-mapping condition matches all column names that match the specified regex condition. 这可与标准的基于规则的映射结合使用。This can be used in combination with standard rule-based mappings.

屏幕截图显示具有层次结构级别和名称匹配的正则表达式映射条件。Screenshot shows the regex-mapping condition with Hierarchy level and Name matches.

上述示例与正则表达式模式 (r) 或任何包含小写 r 的列名匹配。The above example matches on regex pattern (r) or any column name that contains a lower case r. 与标准的基于规则的映射类似,所有匹配的列都使用 $$ 语法由右侧的条件更改。Similar to standard rule-based mapping, all matched columns are altered by the condition on the right using $$ syntax.

如果列名称中有多个正则表达式匹配项,可使用 $n 来指示特定匹配项(其中“n”表示具体的匹配项)。If you have multiple regex matches in your column name, you can refer to specific matches using $n where 'n' refers to which match. 例如,“$2”指列名称中的第二个匹配项。For example, '$2' refers to the second match within a column name.

基于规则的层次结构Rule-based hierarchies

如果定义的投影具有层次结构,则可使用基于规则的映射来映射层次结构子列。If your defined projection has a hierarchy, you can use rule-based mapping to map the hierarchies subcolumns. 指定匹配条件和要映射其子列的复杂列。Specify a matching condition and the complex column whose subcolumns you wish to map. 每个匹配的子列都将使用右侧指定的“名称为”规则进行输出。Every matched subcolumn will be outputted using the 'Name as' rule specified on the right.

屏幕截图显示对层次结构使用的基于规则的映射。Screenshot shows a rule-based mapping using for a hierarchy.

上面的示例与复杂列 a 的所有子列匹配。The above example matches on all subcolumns of complex column a. a 包含两个子列 - bca contains two subcolumns b and c. “名称为”条件是 $$因此输出架构将包含两个列 - bcThe output schema will include two columns b and c as the 'Name as' condition is $$.

参数化Parameterization

可使用基于规则的映射对列名称进行参数化。You can parameterize column names using rule-based mapping. 使用关键字 name 将传入列名称与参数匹配。Use the keyword name to match incoming column names against a parameter. 例如,如果你具数据流参数 mycolumn,则可创建匹配等于 mycolumn 的任何列名称的规则。For example, if you have a data flow parameter mycolumn, you can create a rule that matches any column name that is equal to mycolumn. 可将匹配的列重命名为硬编码的字符串(例如“业务键”),并显式引用它。You can rename the matched column to a hard-coded string such as 'business key' and reference it explicitly. 在此示例中,匹配条件为 name == $mycolumn,名称条件为“业务键”。In this example, the matching condition is name == $mycolumn and the name condition is 'business key'.

自动映射Auto mapping

添加 select 转换时,可切换“自动映射”滑块来启用自动映射。When adding a select transformation, Auto mapping can be enabled by switching the Auto mapping slider. 启用自动映射后,select 转换会映射所有传入列(不包括重复列),其名称与它们的输入相同。With auto mapping, the select transformation maps all incoming columns, excluding duplicates, with the same name as their input. 这将包括偏移列,这意味着输出数据可能包含未在架构中定义的列。This will include drifted columns, which means the output data may contain columns not defined in your schema. 有关偏移列的详细信息,请参阅架构偏差For more information on drifted columns, see schema drift.

自动映射Auto mapping

启用自动映射后,select 转换将遵循跳过重复项设置,并为现有列提供新的别名。With auto mapping on, the select transformation will honor the skip duplicate settings and provide a new alias for the existing columns. 在同一流和自联接方案中执行多个联接或查找时,别名非常有用。Aliasing is useful when doing multiple joins or lookups on the same stream and in self-join scenarios.

重复列Duplicate columns

默认情况下,select 转换会删除输入和输出投影中的重复列。By default, the select transformation drops duplicate columns in both the input and output projection. 重复输入列通常来自联接和查找转换,其中列名称在联接的每一侧都是重复的。Duplicate input columns often come from join and lookup transformations where column names are duplicated on each side of the join. 如果将两个不同的输入列映射到同一名称,则可能产生重复输出列。Duplicate output columns can occur if you map two different input columns to the same name. 通过切换复选框选择是删除还是传递重复列。Choose whether to drop or pass on duplicate columns by toggling the checkbox.

跳过重复项Skip Duplicates

列排序Ordering of columns

映射的顺序决定了输出列的顺序。The order of mappings determines the order of the output columns. 如果多次映射某个输入列,则只采用第一个映射。If an input column is mapped multiple times, only the first mapping will be honored. 在删除重复列时,将保留第一个匹配项。For any duplicate column dropping, the first match will be kept.

数据流脚本Data flow script

语法Syntax

<incomingStream>
    select(mapColumn(
        each(<hierarchicalColumn>, match(<matchCondition>), <nameCondition> = $$), ## hierarchical rule-based matching
        <fixedColumn>, ## fixed mapping, no rename
        <renamedFixedColumn> = <fixedColumn>, ## fixed mapping, rename
        each(match(<matchCondition>), <nameCondition> = $$), ## rule-based mapping
        each(patternMatch(<regexMatching>), <nameCondition> = $$) ## regex mapping
    ),
    skipDuplicateMapInputs: { true | false },
    skipDuplicateMapOutputs: { true | false }) ~> <selectTransformationName>

示例Example

下面是 select 映射及其数据流脚本的一个示例:Below is an example of a select mapping and its data flow script:

选择脚本示例Select script example

DerivedColumn1 select(mapColumn(
        each(a, match(true())),
        movie,
        title1 = title,
        each(match(name == 'Rating')),
        each(patternMatch(`(y)`),
            $1 + 'regex' = $$)
    ),
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> Select1

后续步骤Next steps

  • 使用 select 来对列执行重命名、重新排序和别名设置操作后,使用接收器转换将数据放入数据存储中。After using Select to rename, reorder, and alias columns, use the Sink transformation to land your data into a data store.