以增量方式将数据从源数据存储加载到目标数据存储Incrementally load data from a source data store to a destination data store

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

在数据集成解决方案中,一种广泛使用的方案是在完成初始的完整数据加载后以增量方式加载数据。In a data integration solution, incrementally (or delta) loading data after an initial full data load is a widely used scenario. 此部分的教程介绍如何使用不同的方法,通过 Azure 数据工厂以增量方式加载数据。The tutorials in this section show you different ways of loading data incrementally by using Azure Data Factory.

使用水印从数据库进行增量数据加载Delta data loading from database by using a watermark

在这种情况下,请在源数据库中定义水印。In this case, you define a watermark in your source database. 水印是一个列,其中包含上次更新的时间戳或增量键。A watermark is a column that has the last updated time stamp or an incrementing key. 增量加载解决方案加载旧水印和新水印之间的已更改数据。The delta loading solution loads the changed data between an old watermark and a new watermark. 此方法的工作流见下图:The workflow for this approach is depicted in the following diagram:

水印使用工作流

有关分步说明,请参阅以下教程:For step-by-step instructions, see the following tutorials:

对于模板,请参阅以下内容:For templates, see the following:

使用更改跟踪技术进行的增量数据加载Delta data loading from SQL DB by using the Change Tracking technology

在 SQL Server 和 Azure SQL 数据库中,更改跟踪技术是一种轻型解决方案,为应用程序提供有效的更改跟踪机制。Change Tracking technology is a lightweight solution in SQL Server and Azure SQL Database that provides an efficient change tracking mechanism for applications. 应用程序可以使用这种技术轻松地确定插入、更新或删除的数据。It enables an application to easily identify data that was inserted, updated, or deleted.

此方法的工作流见下图:The workflow for this approach is depicted in the following diagram:

更改跟踪使用工作流

有关分步说明,请参阅以下教程:For step-by-step instructions, see the following tutorial:

仅使用 LastModifiedDate 加载新文件和已更改文件Loading new and changed files only by using LastModifiedDate

只能使用 LastModifiedDate 将新文件和已更改文件复制到目标存储。You can copy the new and changed files only by using LastModifiedDate to the destination store. ADF 会扫描来自源存储的所有文件,按其 LastModifiedDate 应用文件筛选器,然后仅将自上次以来的新文件和已更新文件复制到目标存储。ADF will scan all the files from the source store, apply the file filter by their LastModifiedDate, and only copy the new and updated file since last time to the destination store. 请注意,如果让 ADF 扫描大量文件,但是仅将少量文件复制到目标,则仍然会预计由于文件扫描所导致的较长持续时间也十分耗时。Please be aware if you let ADF scan huge amounts of files but only copy a few files to destination, you would still expect the long duration due to file scanning is time consuming as well.

有关分步说明,请参阅以下教程:For step-by-step instructions, see the following tutorial:

对于模板,请参阅以下内容:For templates, see the following:

仅通过使用时间分区文件夹或文件名称来加载新文件。Loading new files only by using time partitioned folder or file name.

仅可以复制新文件,其中文件或文件夹已经过时间分区,时间片信息作为文件或文件夹名称的一部分(如 /yyyy/mm/dd/file.csv)。You can copy new files only, where files or folders has already been time partitioned with timeslice information as part of the file or folder name (for example, /yyyy/mm/dd/file.csv). 这是用于增量加载新文件的性能最好的方法。It is the most performant approach for incrementally loading new files.

有关分步说明,请参阅以下教程:For step-by-step instructions, see the following tutorial:

后续步骤Next steps

转到以下教程:Advance to the following tutorial: