排查 Azure 数据工厂中映射数据流中的连接器和格式问题Troubleshoot connector and format issues in mapping data flows in Azure Data Factory

本文探讨了 Azure 数据工厂 (ADF) 中与映射数据流的连接器和格式相关的故障排除方法。This article explores troubleshooting methods related to connector and format for mapping data flows in Azure Data Factory (ADF).

Cosmos DB 和 JSONCosmos DB & JSON

支持源中的自定义架构Support customized schemas in the source

症状Symptoms

如果要使用 ADF 数据流将数据从 Cosmos DB/JSON 移动或传输到其他数据存储,则可能会缺失源数据的某些列。When you want to use the ADF data flow to move or transfer data from Cosmos DB/JSON into other data stores, some columns of the source data may be missed.

原因Cause

对于自由架构连接器(与其他行进行比较时,每行的列号、列名和列数据类型可能不同),默认情况下,ADF 使用样本行(例如,前 100 或 1000 行数据)来推断架构,并且推断结果会用作架构来读取数据。For the schema free connectors (the column number, column name and column data type of each row can be different when comparing with others), by default, ADF uses sample rows (for example, top 100 or 1000 rows data) to infer the schema, and the inferred result will be used as a schema to read data. 因此,如果数据存储具有未出现在样本行中的额外列,则这些额外列的数据不会被读取、移动或传输到接收器数据存储中。So if your data stores have extra columns that don't appear in sample rows, the data of these extra columns are not read, moved or transferred into sink data stores.

建议Recommendation

为了覆盖默认行为并引入其他字段,ADF 提供了用于自定义源架构的选项。To overwrite the default behavior and bring in additional fields, ADF provides options for you to customize the source schema. 可以在数据流源投影中指定架构推断结果中可能缺失的额外/缺失列来读取数据,并且可以应用以下选项之一来设置自定义架构。You can specify additional/missing columns that could be missing in schema-infer-result in the data flow source projection to read the data, and you can apply one of the following options to set the customized schema. 通常情况下,选项 1 更为可取。Usually, Option-1 is more preferred.

  • 选项 1:与可能是包含数百万行并具有复杂架构的一个大型文件、表或容器的原始源数据进行比较时,可以使用包含要读取的所有列的几行创建临时表/容器,然后继续执行以下操作:Option-1: Compared with the original source data that may be one large file, table or container that contains millions of rows with complex schemas, you can create a temporary table/container with a few rows that contain all the columns you want to read, and then move on to the following operation:

    1. 使用数据流源“调试设置”使具有样本文件/表的“导入投影”获取完整架构 。Use the data flow source Debug Settings to have Import projection with sample files/tables to get the complete schema. 可以执行下图中的步骤:You can follow the steps in the following picture:

      屏幕截图,显示用于自定义源架构的第一个选项的第一部分。

      1. 在数据流画布中选择“调试设置”。Select Debug settings in the data flow canvas.
      2. 在弹出窗口中,选择“cosmosSource”选项卡下的“样本表”,然后在“表”块中输入表的名称 。In the pop-up pane, select Sample table under the cosmosSource tab, and enter the name of your table in the Table block.
      3. 选择“保存”以保存设置。Select Save to save your settings.
      4. 选择“导入投影”。Select Import projection.
    2. 改回“调试设置”,以将源数据集用于其余数据移动/转换。Change the Debug Settings back to use the source dataset for the remaining data movement/transformation. 可以继续执行下图中的步骤:You can move on with the steps in the following picture:

      屏幕截图,显示用于自定义源架构的第一个选项的第二部分。

      1. 在数据流画布中选择“调试设置”。Select Debug settings in the data flow canvas.
      2. 在弹出窗口中,选择“cosmosSource”选项卡下的“源数据集” 。In the pop-up pane, select Source dataset under the cosmosSource tab.
      3. 选择“保存”以保存设置。Select Save to save your settings.

    之后,ADF 数据流运行时会接受并使用自定义架构从原始数据存储读取数据。Afterwards, the ADF data flow runtime will honor and use the customized schema to read data from the original data store.

  • 选项 2:如果熟悉源数据的架构和 DSL 语言,则可以手动更新数据流源脚本,以添加额外/缺失列来读取数据。Option-2: If you are familiar with the schema and DSL language of the source data, you can manually update the data flow source script to add additional/missed columns to read the data. 下图显示了一个示例:An example is shown in the following picture:

    屏幕截图,显示用于自定义源架构的第二个选项。

后续步骤Next steps

在故障排除时如需更多帮助,请参阅以下资源:For more help with troubleshooting, see these resources: