从文件向数据库进行批量复制Bulk copy from files to database

适用于: Azure 数据工厂 Azure Synapse Analytics

本文介绍了一个解决方案模板,你可以使用该模板将数据从 Azure Data Lake Storage Gen2 批量复制到 Azure Synapse Analytics/Azure SQL 数据库。This article describes a solution template that you can use to copy data in bulk from Azure Data Lake Storage Gen2 to Azure Synapse Analytics / Azure SQL Database.

关于此解决方案模板About this solution template

此模板从 Azure Data Lake Storage Gen2 源检索文件。This template retrieves files from Azure Data Lake Storage Gen2 source. 然后,它会循环访问源中的每个文件,并将文件复制到目标数据存储。Then it iterates over each file in the source and copies the file to the destination data store.

当前,此模板仅支持复制 DelimitedText 格式的数据。Currently this template only supports copying data in DelimitedText format. 也可以从源数据存储中检索其他数据格式的文件,但不能将这些文件复制到目标数据存储。Files in other data formats can also be retrieved from source data store, but can not be copied to the destination data store.

该模板包含三个活动:The template contains three activities:

  • 获取元数据 活动可从 Azure Data Lake Storage Gen2 中检索文件,然后将其传递给后续的 ForEach 活动。Get Metadata activity retrieves files from Azure Data Lake Storage Gen2, and passes them to subsequent ForEach activity.
  • ForEach 活动可从获取元数据活动获取文件,并以迭代方式将每个文件传递给复制活动。ForEach activity gets files from the Get Metadata activity and iterates each file to the Copy activity.
  • 复制 活动位于 ForEach 活动中,可将源数据存储中的每个文件复制到目标数据存储。Copy activity resides in ForEach activity to copy each file from the source data store to the destination data store.

此模板定义以下两个参数:The template defines the following two parameters:

  • SourceContainer 是从 Azure Data Lake Storage Gen2 中复制数据的根容器路径。SourceContainer is the root container path where the data is copied from in your Azure Data Lake Storage Gen2.
  • SourceDirectory 是从 Azure Data Lake Storage Gen2 中复制数据的根容器下的目录路径。SourceDirectory is the directory path under the root container where the data is copied from in your Azure Data Lake Storage Gen2.

如何使用此解决方案模板How to use this solution template

  1. 转到“从文件向数据库进行批量复制”模板。Go to the Bulk Copy from Files to Database template. 创建与源 Gen2 存储的新连接。Create a New connection to the source Gen2 store. 请注意,“GetMetadataDataset”和“SourceDataset”是对源文件存储的同一连接的引用。Be aware that "GetMetadataDataset" and "SourceDataset" are references to the same connection of your source file store.

    创建与源数据存储的新连接

  2. 创建与要将数据复制到其中的接收器数据存储的新连接。Create a New connection to the sink data store that you're copying data to.

    创建与接收器数据存储的新连接

  3. 选择“使用此模板”。Select Use this template.

    使用此模板

  4. 你会看到所创建的管道,如以下示例中所示:You would see a pipeline created as shown in the following example:

    查看管道

    备注

    如果选择 Azure Synapse Analytics 作为上述步骤 2 中的数据目标,必须按 Azure Synapse Analytics Polybase 的要求输入用于暂存的 Azure Blob 存储的连接 。If you chose Azure Synapse Analytics as the data destination in step 2 mentioned above, you must enter a connection to Azure Blob storage for staging, as required by Azure Synapse Analytics Polybase. 如以下屏幕截图所示,模板会自动为 Blob 存储生成存储路径。As the following screenshot shows, the template will automatically generate a Storage Path for your Blob storage. 检查是否在管道运行后创建了容器。Check if the container has been created after the pipeline run.

    Polybase 设置

  5. 选择“调试”,输入“参数”,然后选择“完成”。Select Debug, enter the Parameters, and then select Finish.

    单击“调试”****

  6. 管道运行成功完成后,你会看到类似于以下示例的结果:When the pipeline run completes successfully, you would see results similar to the following example:

    查看结果

后续步骤Next steps