映射数据流中的派生列转换Derived column transformation in mapping data flow

适用于: Azure 数据工厂 Azure Synapse Analytics

使用派生列转换在数据流中生成新列或修改现有字段。Use the derived column transformation to generate new columns in your data flow or to modify existing fields.

创建和更新列Create and update columns

创建派生列时,可以生成新列或更新现有列。When creating a derived column, you can either generate a new column or update an existing one. 在“列”文本框中,输入要创建的列。In the Column textbox, enter in the column you are creating. 若要替代架构中的现有列,可以使用“列”下拉列表。To override an existing column in your schema, you can use the column dropdown. 若要生成派生列的表达式,请单击“输入表达式”文本框。To build the derived column's expression, click on the Enter expression textbox. 你可以开始键入表达式或打开表达式生成器来构造逻辑。You can either start typing your expression or open up the expression builder to construct your logic.

派生列设置Derived column settings

若要添加更多派生列,请单击列列表上方的“添加”或现有派生列旁边的加号图标。To add more derived columns, click on Add above the column list or the plus icon next to an existing derived column. 选择“添加列”或“添加列模式”。Choose either Add column or Add column pattern.

新派生列选择New derived column selection

列模式Column patterns

如果未显式定义架构,或者你想要批量更新一组列,则需要创建列模式。In cases where your schema is not explicitly defined or if you want to update a set of columns in bulk, you will want to create column patters. 列模式让你可以使用基于列元数据的规则与列匹配,并为每个匹配列创建派生列。Column patterns allow for you to match columns using rules based upon the column metadata and create derived columns for each matched column. 如需了解更多信息,请参阅在派生列转换过程中如何生成列模式For more information, learn how to build column patterns in the derived column transformation.

列模式Column patterns

使用表达式生成器生成架构Building schemas using the expression builder

使用映射数据流表达式生成器时,可以在“派生列”部分中创建、编辑和管理派生列。When using the mapping data flow expression builder, you can create, edit, and manage your derived columns in the Derived Columns section. 系统将列出转换过程中创建或更改的所有列。All columns that are created or changed in the transformation are listed. 你可以单击列名称,以交互方式选择要编辑的列或模式。Interactively choose which column or pattern you are editing by clicking on the column name. 若要添加其他列,请选择“新建”,并选择是否要添加单个列或模式。To add an additional column select Create new and choose whether you wish to add a single column or a pattern.

新建列Create new column

使用复杂列时,可以创建子列。When working with complex columns, you can create subcolumns. 要创建子列,请单击任意列旁边的加号图标,然后选择“添加子列”。To do this, click on the plus icon next to any column and select Add subcolumn. 有关处理数据流中的复杂类型的详细信息,请参阅映射数据流中的 JSON 处理For more information on handling complex types in data flow, see JSON handling in mapping data flow.

添加子列Add subcolumn

有关处理数据流中的复杂类型的详细信息,请参阅映射数据流中的 JSON 处理For more information on handling complex types in data flow, see JSON handling in mapping data flow.

添加复杂列Add complex column

局部变量Locals

如果要在多个列之间共享逻辑或想要划分逻辑,可以在派生列转换过程中创建一个局部变量。If you are sharing logic across multiple columns or want to compartmentalize your logic, you can create a local within a derived column transformation. 局部变量是一组逻辑,不会传播到下游的以下转换。A local is a set of logic that doesn't get propagated downstream to the following transformation. 通过转到表达式元素并选择“局部变量”,可以在表达式生成器中创建局部变量 。Locals can be created within the expression builder by going to Expression elements and selecting Locals. 选择“新建”可以创建新的局部变量。Create a new one by selecting Create new.

创建局部变量Create local

局部变量可以引用派生列中的任何表达式元素,包括函数、输入架构、参数和其他局部变量。Locals can reference any expression element a derived column including functions, input schema, parameters, and other locals. 引用其他局部变量时,顺序很重要,因为引用的局部变量需要“高于”当前的局部变量。When referencing other locals, order does matter as the referenced local needs to be "above" the current one.

创建局部变量 2Create local 2

若要引用派生列中的局部变量,请单击“表达式元素”视图中的局部变量,或者在其名称前面用冒号引用它。To reference a local in a derived column, either click on the local from the Expression elements view or reference it with a colon in front of its name. 例如,名为 local1 的局部变量将由 :local1 引用。For example, a local called local1 would be referenced by :local1. 若要编辑局部变量的定义,请将鼠标悬停在“表达式元素”视图中,然后单击铅笔图标。To edit a local definition, hover over it in the expression elements view and click on the pencil icon.

使用局部变量Using locals

数据流脚本Data flow script

语法Syntax

<incomingStream>
    derive(
           <columnName1> = <expression1>,
           <columnName2> = <expression2>,
           each(
                match(matchExpression),
                <metadataColumn1> = <metadataExpression1>,
                <metadataColumn2> = <metadataExpression2>
               )
          ) ~> <deriveTransformationName>

示例Example

下面的示例是一个名为 CleanData 的派生列,它接受传入的流 MoviesYear 并创建两个派生列。The below example is a derived column named CleanData that takes an incoming stream MoviesYear and creates two derived columns. 第一个派生列将列 Rating 替换为整数类型的 Rating 值。The first derived column replaces column Rating with Rating's value as an integer type. 第二个派生列是与名称以“movies”开头的每个列匹配的模式。The second derived column is a pattern that matches each column whose name starts with 'movies'. 对于每个匹配列,它会创建列 movie,该列等于前缀为“movie_”的匹配列的值。For each matched column, it creates a column movie that is equal to the value of the matched column prefixed with 'movie_'.

在数据工厂 UX 中,此转换如下图所示:In the Data Factory UX, this transformation looks like the below image:

派生示例Derive example

此转换的数据流脚本位于下面的代码片段中:The data flow script for this transformation is in the snippet below:

MoviesYear derive(
                Rating = toInteger(Rating),
                each(
                    match(startsWith(name,'movies')),
                    'movie' = 'movie_' + toString($$)
                )
            ) ~> CleanData

后续步骤Next steps