使用映射数据流处理数据工厂中的 SQL 截断错误行Handle SQL truncation error rows in Data Factory mapping data flows

适用于: Azure 数据工厂 Azure Synapse Analytics

使用映射数据流时数据工厂中的一个常见场景是,将转换后的数据写入 Azure SQL 数据库中的数据库。A common scenario in Data Factory when using mapping data flows, is to write your transformed data to a database in Azure SQL Database. 在此场景中,必须防止的一种常见错误情况是可能的列截断。In this scenario, a common error condition that you must prevent against is possible column truncation.

在 ADF 数据流中将数据写入数据库接收器时,有两种主要方法可用于正常处理错误:There are two primary methods to graceful handle errors when writing data to your database sink in ADF data flows:

  • 在处理数据库数据时,将接收器错误行处理设置为“出错时继续”。Set the sink error row handling to "Continue on Error" when processing database data. 这是一种自动的综合方法,不需要在数据流中使用自定义逻辑。This is an automated catch-all method that does not require custom logic in your data flow.
  • 或者,按照以下步骤提供不适合目标字符串列的列日志记录,从而允许数据流继续运行。Alternatively, follow the steps below to provide logging of columns that won't fit into a target string column, allowing your data flow to continue.

备注

启用自动错误行处理时,与下面编写你自己的错误处理逻辑这一方法不同,ADF 执行 2 阶段操作以捕获错误时将引起较小的性能损失,并且将采取额外的步骤。When enabling automatic error row handling, as opposed to the method below of writing your own error handling logic, there will be a small performance penalty incurred by and additional step taken by ADF to perform a 2-phase operation to trap errors.

方案Scenario

  1. 我们有一个包含名为“name”的 nvarchar(5) 列的目标数据库表。We have a target database table that has an nvarchar(5) column called "name".

  2. 在数据流中,我们想要将接收器中的电影标题映射到该目标“name”列。Inside of our data flow, we want to map movie titles from our sink to that target "name" column.

    电影数据流 1

  3. 问题在于,电影标题无法全部放在接收器列中,这一列只能容纳 5 个字符。The problem is that the movie title won't all fit within a sink column that can only hold 5 characters. 在执行此数据流时,你将收到如下错误:"Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated."When you execute this data flow, you will receive an error like this one: "Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated."

如何围绕此情况进行设计How to design around this condition

  1. 在此场景中,“name”列的最大长度为 5 个字符。In this scenario, the maximum length of the "name" column is five characters. 因此,让我们添加一个有条件拆分转换,该转换允许我们记录“title”长度超过 5 个字符的行,同时还允许可以放入该空间的其余行写入数据库。So, let's add a conditional split transformation that will allow us to log rows with "titles" that are longer than five characters while also allowing the rest of the rows that can fit into that space to write to the database.

    有条件拆分 (conditional split)

  2. 此有条件拆分转换将“title”的最大长度定义为 5 个字符。This conditional split transformation defines the maximum length of "title" to be five. 任何小于或等于 5 的行都将进入 GoodRows 流中。Any row that is less than or equal to five will go into the GoodRows stream. 任何大于 5 的行都将进入 BadRows 流中。Any row that is larger than five will go into the BadRows stream.

  3. 现在,我们需要记录失败的行。Now we need to log the rows that failed. 将接收器转换添加到 BadRows 流以进行日志记录。Add a sink transformation to the BadRows stream for logging. 在这里,我们将自动映射所有字段,以便记录完整的事务记录。Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record. 这是一个以文本分隔的 CSV 文件输出,流向 Blob 存储中的单个文件。This is a text-delimited CSV file output to a single file in Blob Storage. 我们将调用日志文件“badrows.csv”。We'll call the log file "badrows.csv".

    错误行

  4. 已完成的数据流如下所示。The completed data flow is shown below. 现在,我们可以拆分错误行以避免 SQL 截断错误,并将这些条目放入日志文件中。We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file. 同时,成功的行可以继续写入目标数据库。Meanwhile, successful rows can continue to write to our target database.

    完成数据流

  5. 如果在接收器转换中选择“错误行处理”选项并设置“输出错误行”,ADF 将自动生成行数据的 CSV 文件输出以及驱动程序报告的错误消息。If you choose the error row handling option in the sink transformation and set "Output error rows", ADF will automatically generate a CSV file output of your row data along with the driver-reported error messages. 无需手动将该逻辑添加到具有该替代选项的数据流。You do not need to add that logic manually to your data flow with that alternative option. 使用此选项时,ADF 可以实现一种两阶段方法来捕获错误并记录这些错误,但会出现较小的性能损失。There will be a small performance penalty incurred with this option so that ADF can implement a 2-phase methodology to trap errors and log them.

    完成包含错误行的数据流

后续步骤Next steps

  • 使用映射数据流转换来生成数据流逻辑的其余部分。Build the rest of your data flow logic by using mapping data flows transformations.