映射数据流中的“更改行”转换Alter row transformation in mapping data flow

适用于: Azure 数据工厂 Azure Synapse Analytics

使用“更改行”转换来设置针对行的插入、删除、更新和更新插入策略。Use the Alter Row transformation to set insert, delete, update, and upsert policies on rows. 可以将一对多条件添加为表达式。You can add one-to-many conditions as expressions. 应按优先级顺序指定这些条件,因为每一行都将用与第一个匹配的表达式对应的策略进行标记。These conditions should be specified in order of priority, as each row will be marked with the policy corresponding to the first-matching expression. 其中的每个条件都可能会导致对行执行插入、更新、删除或更新插入操作。Each of those conditions can result in a row (or rows) being inserted, updated, deleted, or upserted. “更改行”可能会产生对数据库的 DDL 和 DML 操作。Alter Row can produce both DDL & DML actions against your database.

“更改行”设置Alter row settings

“更改行”转换将仅对你的数据流中的数据库或 CosmosDB 接收器进行操作。Alter Row transformations will only operate on database or CosmosDB sinks in your data flow. 在调试会话期间,分配给行的操作(插入、更新、删除、更新插入)不会发生。The actions that you assign to rows (insert, update, delete, upsert) won't occur during debug sessions. 在管道中运行“执行数据流”活动,对数据库表执行“更改行”策略。Run an Execute Data Flow activity in a pipeline to enact the alter row policies on your database tables.

指定一个默认行策略Specify a default row policy

创建一个“更改行”转换,并指定一个条件为 true() 的行策略。Create an Alter Row transformation and specify a row policy with a condition of true(). 与前面定义的任何表达式都不匹配的每个行都将标记为执行指定的行策略。Each row that doesn't match any of the previously defined expressions will be marked for the specified row policy. 默认情况下,与任何条件表达式都不匹配的每个行都将标记为执行 InsertBy default, each row that doesn't match any conditional expression will be marked for Insert.

“更改行”策略Alter row policy

备注

若要为所有行标记一个策略,可以为该策略创建一个条件,并将条件指定为 true()To mark all rows with one policy, you can create a condition for that policy and specify the condition as true().

在数据预览中查看策略View policies in data preview

使用调试模式在数据预览窗格中查看“更改行”策略的结果。Use debug mode to view the results of your alter row policies in the data preview pane. “更改行”转换的数据预览不会针对目标产生 DDL 或 DML 操作。A data preview of an alter row transformation won't produce DDL or DML actions against your target.

“更改行”策略Alter row policies

每个“更改行”策略都由一个图标表示,该图标指示是否会执行插入、更新、更新插入或删除操作。Each alter row policy is represented by an icon that indicates whether an insert, update, upsert, or deleted action will occur. 在预览中,顶部标题会显示每个策略所影响的行数。The top header shows how many rows are affected by each policy in the preview.

在接收器中允许“更改行”策略Allow alter row policies in sink

要使“更改行”策略有效,数据流必须写入到数据库或 Cosmos 接收器。For the alter row policies to work, the data stream must write to a database or Cosmos sink. 在接收器的“设置”选项卡中,启用允许该接收器使用的“更改行”策略。In the Settings tab in your sink, enable which alter row policies are allowed for that sink.

“更改行”接收器Alter row sink

默认行为是仅允许插入。The default behavior is to only allow inserts. 若要允许更新、更新插入或删除,请选中接收器中与该条件对应的框。To allow updates, upserts, or deletes, check the box in the sink corresponding to that condition. 如果启用更新、更新插入或删除操作,则必须指定接收器中与之匹配的键列。If updates, upserts, or, deletes are enabled, you must specify which key columns in the sink to match on.

备注

如果插入、更新或更新插入操作修改了接收器中目标表的架构,则数据流会失败。If your inserts, updates, or upserts modify the schema of the target table in the sink, the data flow will fail. 若要修改数据库中的目标架构,请选择“重新创建表”作为表操作。To modify the target schema in your database, choose Recreate table as the table action. 这将删除表并使用新的架构定义重新创建表。This will drop and recreate your table with the new schema definition.

接收器转换需要单个键或一系列键才能在目标数据库中对行进行独一无二的标识。The sink transformation requires either a single key or a series of keys for unique row identification in your target database. 对于 SQL 接收器,请在接收器“设置”选项卡中设置键。对于 CosmosDB,请在“设置”中设置分区键,并且还要在接收器映射中设置 CosmosDB 系统字段“id”。For SQL sinks, set the keys in the sink settings tab. For CosmosDB, set the partition key in the settings and also set the CosmosDB system field "id" in your sink mapping. 对于 CosmosDB,必须包含针对更新、更新插入和删除操作的系统列“id”。For CosmosDB, it is mandatory to include the system column "id" for updates, upserts, and deletes.

对 Azure SQL 数据库和 Synapse 的合并与更新插入Merges and upserts with Azure SQL Database and Synapse

ADF 数据流支持使用更新插入选项对 Azure SQL 数据库和 Synapse 数据库池(数据仓库)执行合并操作。ADF Data Flows supports merges against Azure SQL Database and Synapse database pool (data warehouse) with the upsert option.

但是,你可能会遇到目标数据库架构使用了键列的标识属性的情况。However, you may run into scenarios where your target database schema utilized the identity property of key columns. ADF 要求你标识用来为更新和更新插入操作匹配行值的键。ADF requires you to identify the keys that you will use to match the row values for updates and upserts. 但是,如果目标列设置了标识属性,而你使用的是更新插入策略,则目标数据库不会允许写入到列。But if the target column has the identity property set and you are using the upsert policy, the target database will not allow you to write to the column. 尝试对分布式表的分布列进行更新插入时,也可能会遇到错误。You may also run into errors when you try to upsert against a distributed table's distribution column.

下面是解决该问题的方法:Here are ways to fix that:

  1. 转到“接收器转换设置”并设置“跳过写入键列”。Go to the Sink transformation Settings and set "Skip writing key columns". 这将告诉 ADF 不要写入你选择用作映射键值的列。This will tell ADF to not write the column that you have selected as the key value for your mapping.

  2. 如果该键列不是导致标识列问题的列,则可使用此接收器转换预处理 SQL 选项:SET IDENTITY_INSERT tbl_content ONIf that key column is not the column that is causing the issue for identity columns, then you can use the Sink transformation pre-processing SQL option: SET IDENTITY_INSERT tbl_content ON. 然后,通过以下后处理 SQL 属性将其关闭:SET IDENTITY_INSERT tbl_content OFFThen, turn it off with the post-processing SQL property: SET IDENTITY_INSERT tbl_content OFF.

  3. 对于标识案例和分布列案例,可以使用“条件拆分”转换将逻辑从更新插入切换为使用单独的更新条件和单独的插入条件。For both the identity case and the distribution column case, you can switch your logic from Upsert to using a separate update condition and a separate insert condition using a Conditional Split transformation. 这样即可将更新路径上的映射设置为忽略键列映射。This way, you can set the mapping on the update path to ignore the key column mapping.

数据流脚本Data flow script

语法Syntax

<incomingStream>
    alterRow(
           insertIf(<condition>?),
           updateIf(<condition>?),
           deleteIf(<condition>?),
           upsertIf(<condition>?),
        ) ~> <alterRowTransformationName>

示例Example

下面的示例是一个名为 CleanData 的“更改行”转换,它接受传入的流 SpecifyUpsertConditions 并创建三个“更改行”条件。The below example is an alter row transformation named CleanData that takes an incoming stream SpecifyUpsertConditions and creates three alter row conditions. 在上一转换中计算了名为 alterRowCondition 的列,目的是确定是否在数据库中插入、更新或删除行。In the previous transformation, a column named alterRowCondition is calculated that determines whether or not a row is inserted, updated, or deleted in the database. 如果列的值包含与“更改行”规则匹配的字符串值,则会为其分配该策略。If the value of the column has a string value that matches the alter row rule, it is assigned that policy.

在数据工厂 UX 中,此转换如下图所示:In the Data Factory UX, this transformation looks like the below image:

“更改行”示例Alter row example

此转换的数据流脚本位于下面的代码片段中:The data flow script for this transformation is in the snippet below:

SpecifyUpsertConditions alterRow(insertIf(alterRowCondition == 'insert'),
    updateIf(alterRowCondition == 'update'),
    deleteIf(alterRowCondition == 'delete')) ~> AlterRow

后续步骤Next steps

完成“更改行”转换后,即可将数据接收到目标数据存储中After the Alter Row transformation, you may want to sink your data into a destination data store.