使用 Azure 门户以增量方式将 Azure SQL 数据库中的数据加载到 Azure Blob 存储Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal

适用于: Azure 数据工厂 Azure Synapse Analytics

在本教程中,你将创建一个带管道的 Azure 数据工厂,将增量数据从 Azure SQL 数据库中的表加载到 Azure Blob 存储。In this tutorial, you create an Azure Data Factory with a pipeline that loads delta data from a table in Azure SQL Database to Azure Blob storage.

在本教程中执行以下步骤:You perform the following steps in this tutorial:

  • 准备用于存储水印值的数据存储。Prepare the data store to store the watermark value.
  • 创建数据工厂。Create a data factory.
  • 创建链接服务。Create linked services.
  • 创建源、接收器和水印数据集。Create source, sink, and watermark datasets.
  • 创建管道。Create a pipeline.
  • 运行管道。Run the pipeline.
  • 监视管道运行。Monitor the pipeline run.
  • 查看结果Review results
  • 向源添加更多数据。Add more data to the source.
  • 再次运行管道。Run the pipeline again.
  • 监视第二个管道运行Monitor the second pipeline run
  • 查看第二个运行的结果Review results from the second run

概述Overview

下面是高级解决方案示意图:Here is the high-level solution diagram:

以增量方式加载数据

下面是创建此解决方案所要执行的重要步骤:Here are the important steps to create this solution:

  1. 选择水印列Select the watermark column. 在源数据存储中选择一个列,该列可用于将每个运行的新记录或已更新记录切片。Select one column in the source data store, which can be used to slice the new or updated records for every run. 通常,在创建或更新行时,此选定列中的数据(例如 last_modify_time 或 ID)会不断递增。Normally, the data in this selected column (for example, last_modify_time or ID) keeps increasing when rows are created or updated. 此列中的最大值用作水印。The maximum value in this column is used as a watermark.

  2. 准备用于存储水印值的数据存储Prepare a data store to store the watermark value. 本教程在 SQL 数据库中存储水印值。In this tutorial, you store the watermark value in a SQL database.

  3. 创建采用以下工作流的管道Create a pipeline with the following workflow:

    此解决方案中的管道具有以下活动:The pipeline in this solution has the following activities:

    • 创建两个 Lookup 活动。Create two Lookup activities. 使用第一个 Lookup 活动检索上一个水印值。Use the first Lookup activity to retrieve the last watermark value. 使用第二个 Lookup 活动检索新的水印值。Use the second Lookup activity to retrieve the new watermark value. 这些水印值会传递到 Copy 活动。These watermark values are passed to the Copy activity.
    • 创建 Copy 活动,用于复制源数据存储中其水印列值大于旧水印值但小于新水印值的行。Create a Copy activity that copies rows from the source data store with the value of the watermark column greater than the old watermark value and less than the new watermark value. 然后,该活动将源数据存储中的增量数据作为新文件复制到 Blob 存储。Then, it copies the delta data from the source data store to Blob storage as a new file.
    • 创建 StoredProcedure 活动,用于更新下一次运行的管道的水印值。Create a StoredProcedure activity that updates the watermark value for the pipeline that runs next time.

如果没有 Azure 订阅,请在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

先决条件Prerequisites

  • Azure SQL 数据库Azure SQL Database. 将数据库用作源数据存储。You use the database as the source data store. 如果 Azure SQL 数据库没有数据库,请参阅在 Azure SQL 数据库中创建数据库,了解创建步骤。If you don't have a database in Azure SQL Database, see Create a database in Azure SQL Database for steps to create one.
  • Azure 存储Azure Storage. 将 Blob 存储用作接收器数据存储。You use the blob storage as the sink data store. 如果没有存储帐户,请参阅创建存储帐户以获取创建步骤。If you don't have a storage account, see Create a storage account for steps to create one. 创建名为 adftutorial 的容器。Create a container named adftutorial.

在 SQL 数据库中创建数据源表Create a data source table in your SQL database

  1. 打开 SQL Server Management Studio。Open SQL Server Management Studio. 在“服务器资源管理器”中,右键单击数据库,然后选择“新建查询”。In Server Explorer, right-click the database, and choose New Query.

  2. 针对 SQL 数据库运行以下 SQL 命令,创建名为 data_source_table 的表作为数据源存储:Run the following SQL command against your SQL database to create a table named data_source_table as the data source store:

    create table data_source_table
    (
        PersonID int,
        Name varchar(255),
        LastModifytime datetime
    );
    
    INSERT INTO data_source_table
    (PersonID, Name, LastModifytime)
    VALUES
    (1, 'aaaa','9/1/2017 12:56:00 AM'),
    (2, 'bbbb','9/2/2017 5:23:00 AM'),
    (3, 'cccc','9/3/2017 2:36:00 AM'),
    (4, 'dddd','9/4/2017 3:21:00 AM'),
    (5, 'eeee','9/5/2017 8:06:00 AM');
    

    本教程使用 LastModifytime 作为水印列。In this tutorial, you use LastModifytime as the watermark column. 下表显示了数据源存储中的数据:The data in the data source store is shown in the following table:

    PersonID | Name | LastModifytime
    -------- | ---- | --------------
    1 | aaaa | 2017-09-01 00:56:00.000
    2 | bbbb | 2017-09-02 05:23:00.000
    3 | cccc | 2017-09-03 02:36:00.000
    4 | dddd | 2017-09-04 03:21:00.000
    5 | eeee | 2017-09-05 08:06:00.000
    

在 SQL 数据库中创建另一个表,用于存储高水印值Create another table in your SQL database to store the high watermark value

  1. 针对 SQL 数据库运行以下 SQL 命令,创建名为 watermarktable 的表,用于存储水印值:Run the following SQL command against your SQL database to create a table named watermarktable to store the watermark value:

    create table watermarktable
    (
    
    TableName varchar(255),
    WatermarkValue datetime,
    );
    
  2. 使用源数据存储的表名设置高水印的默认值。Set the default value of the high watermark with the table name of source data store. 在本教程中,表名为 data_source_table。In this tutorial, the table name is data_source_table.

    INSERT INTO watermarktable
    VALUES ('data_source_table','1/1/2010 12:00:00 AM')    
    
  3. 查看 watermarktable 表中的数据。Review the data in the table watermarktable.

    Select * from watermarktable
    

    输出:Output:

    TableName  | WatermarkValue
    ----------  | --------------
    data_source_table | 2010-01-01 00:00:00.000
    

在 SQL 数据库中创建存储过程Create a stored procedure in your SQL database

运行以下命令,在 SQL 数据库中创建存储过程:Run the following command to create a stored procedure in your SQL database:

CREATE PROCEDURE usp_write_watermark @LastModifiedtime datetime, @TableName varchar(50)
AS

BEGIN

UPDATE watermarktable
SET [WatermarkValue] = @LastModifiedtime
WHERE [TableName] = @TableName

END

创建数据工厂Create a data factory

  1. 启动 Microsoft EdgeGoogle Chrome Web 浏览器。Launch Microsoft Edge or Google Chrome web browser. 目前,仅 Microsoft Edge 和 Google Chrome Web 浏览器支持数据工厂 UI。Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.

  2. 在左侧菜单中,选择“创建资源” > “数据 + 分析” > “数据工厂”:On the left menu, select Create a resource > Data + Analytics > Data Factory:

    在“新建”窗格中选择“数据工厂”

  3. 在“新建数据工厂”页中,输入 ADFIncCopyTutorialDF 作为 名称In the New data factory page, enter ADFIncCopyTutorialDF for the name.

    Azure 数据工厂的名称必须全局唯一。The name of the Azure Data Factory must be globally unique. 如果看到红色感叹号和以下错误,请更改数据工厂的名称(例如改为 yournameADFIncCopyTutorialDF),并重新尝试创建。If you see a red exclamation mark with the following error, change the name of the data factory (for example, yournameADFIncCopyTutorialDF) and try creating again. 有关数据工厂项目命名规则,请参阅数据工厂 - 命名规则一文。See Data Factory - Naming Rules article for naming rules for Data Factory artifacts.

    数据工厂名“ADFIncCopyTutorialDF”不可用Data factory name "ADFIncCopyTutorialDF" is not available

  4. 选择要在其中创建数据工厂的 Azure 订阅Select your Azure subscription in which you want to create the data factory.

  5. 对于 资源组,请执行以下步骤之一:For the Resource Group, do one of the following steps:

    • 选择“使用现有资源组”,并从下拉列表选择现有的资源组。 Select Use existing, and select an existing resource group from the drop-down list.

    • 选择“新建”,并输入资源组的名称。Select Create new, and enter the name of a resource group.

      若要了解有关资源组的详细信息,请参阅 使用资源组管理 Azure 资源To learn about resource groups, see Using resource groups to manage your Azure resources.

  6. 选择数据工厂的 位置Select the location for the data factory. 下拉列表中仅显示支持的位置。Only locations that are supported are displayed in the drop-down list. 数据工厂使用的数据存储(Azure 存储、Azure SQL 数据库、Azure SQL 托管实例等)和计算资源(HDInsight 等)可位于其他区域中。The data stores (Azure Storage, Azure SQL Database, Azure SQL Managed Instance, and so on) and computes (HDInsight, etc.) used by data factory can be in other regions.

  7. 单击“创建”。Click Create.

  8. 创建完成后,可以看到图中所示的“数据工厂”页。After the creation is complete, you see the Data Factory page as shown in the image.

    Azure 数据工厂的主页,其中包含“创作和监视”磁贴。

  9. 单击“创作和监视”磁贴,在单独的选项卡中启动 Azure 数据工厂用户界面 (UI)。Click Author & Monitor tile to launch the Azure Data Factory user interface (UI) in a separate tab.

创建管道Create a pipeline

本教程创建包含两个 Lookup 活动、一个 Copy 活动和一个 StoredProcedure 活动的管道,这些活动链接在一个管道中。In this tutorial, you create a pipeline with two Lookup activities, one Copy activity, and one StoredProcedure activity chained in one pipeline.

  1. 在数据工厂 UI 的“入门”页中,单击“创建管道”磁贴。In the get started page of Data Factory UI, click the Create pipeline tile.

    数据工厂 UI 的“入门”页

  2. 在“常规”面板的“属性”中,将名称指定为 IncrementalCopyPipeline 。In the General panel under Properties, specify IncrementalCopyPipeline for Name. 然后通过单击右上角的“属性”图标来折叠面板。Then collapse the panel by clicking the Properties icon in the top-right corner.

  3. 请添加第一个查找活动,获取旧水印值。Let's add the first lookup activity to get the old watermark value. 在“活动”工具箱中展开“常规”, 将 查找 活动拖放到管道设计器图面。In the Activities toolbox, expand General, and drag-drop the Lookup activity to the pipeline designer surface. 将活动的名称更改为 LookupOldWaterMarkActivityChange the name of the activity to LookupOldWaterMarkActivity.

    第一个查找活动 - 名称

  4. 切换到“设置”选项卡,针对“源数据集”单击“+ 新建”。Switch to the Settings tab, and click + New for Source Dataset. 在此步骤中,请创建一个代表 watermarktable 中数据的数据集。In this step, you create a dataset to represent data in the watermarktable. 此表包含在前一复制操作中使用过的旧水印。This table contains the old watermark that was used in the previous copy operation.

  5. 在“新建数据集”窗口中,选择“Azure SQL 数据库”,然后单击“继续”。 In the New Dataset window, select Azure SQL Database, and click Continue. 此时可以看到为数据集打开了一个新窗口。You see a new window opened for the dataset.

  6. 在数据集的“设置属性”属性窗口中,输入“WatermarkDataset”作为“名称”。In the Set properties window for the dataset, enter WatermarkDataset for Name.

  7. 对于“链接服务”,选择“新建”,然后执行下列步骤:For Linked Service, select New, and then do the following steps:

    1. 对于“名称”,请输入 AzureSqlDatabaseLinkedServiceEnter AzureSqlDatabaseLinkedService for Name.

    2. 对于“服务器名称”,请选择你的服务器。Select your server for Server name.

    3. 从下拉列表中选择“数据库名称”。Select your Database name from the dropdown list.

    4. 请输入“用户名” & “密码”。Enter your User name & Password.

    5. 若要测试到 SQL 数据库的连接,请单击“测试连接”。To test connection to the your SQL database, click Test connection.

    6. 单击“完成”。Click Finish.

    7. 对于“链接服务”,确认选择了“AzureSqlDatabaseLinkedService”。Confirm that AzureSqlDatabaseLinkedService is selected for Linked service.

      “新建链接服务”窗口

    8. 选择“完成”。Select Finish.

  8. 在“连接”选项卡中,对于“表”,选择“[dbo].[watermarktable]”。In the Connection tab, select [dbo].[watermarktable] for Table. 若要预览表中的数据,请单击“预览数据”。If you want to preview data in the table, click Preview data.

    水印数据集 - 连接设置

  9. 通过单击顶部的管道选项卡,或者单击左侧树状视图中管道的名称,切换到管道编辑器。Switch to the pipeline editor by clicking the pipeline tab at the top or by clicking the name of the pipeline in the tree view on the left. 查找 活动的属性窗口中,确认对于“源数据集”字段,是否已选择 WatermarkDatasetIn the properties window for the Lookup activity, confirm that WatermarkDataset is selected for the Source Dataset field.

  10. 在“活动”工具箱中展开“常规”, 将另一 查找 活动拖放到管道设计器图面,然后在属性窗口的“常规”选项卡中将名称设置为 LookupNewWaterMarkActivityIn the Activities toolbox, expand General, and drag-drop another Lookup activity to the pipeline designer surface, and set the name to LookupNewWaterMarkActivity in the General tab of the properties window. 此“查找”活动从特定表获取新的水印值,该表包含的源数据可以复制到目标。This Lookup activity gets the new watermark value from the table with the source data to be copied to the destination.

  11. 在第二个“复制”活动的属性窗口中切换到“设置”选项卡,然后单击“新建”。In the properties window for the second Lookup activity, switch to the Settings tab, and click New. 请创建一个数据集,使之指向源表,该表包含新的水印值(LastModifyTime 的最大值)。You create a dataset to point to the source table that contains the new watermark value (maximum value of LastModifyTime).

  12. 在“新建数据集”窗口中,选择“Azure SQL 数据库”,然后单击“继续”。In the New Dataset window, select Azure SQL Database, and click Continue.

  13. 在属性窗口的“常规”选项卡中,对于“名称”输入“SourceDataset” 。In the Set properties window, enter SourceDataset for Name. 为“链接服务”选择“AzureSqlDatabaseLinkedService”。Select AzureSqlDatabaseLinkedService for Linked service.

  14. 对于“表”,请选择“[dbo].[data_source_table]”。Select [dbo].[data_source_table] for Table. 本教程后面需指定一个针对此数据集的查询。You specify a query on this dataset later in the tutorial. 此查询优先于在此步骤中指定的表。The query takes the precedence over the table you specify in this step.

  15. 选择“完成”。Select Finish.

  16. 通过单击顶部的管道选项卡,或者单击左侧树状视图中管道的名称,切换到管道编辑器。Switch to the pipeline editor by clicking the pipeline tab at the top or by clicking the name of the pipeline in the tree view on the left. 查找 活动的属性窗口中,确认对于“源数据集”字段,是否已选择 SourceDatasetIn the properties window for the Lookup activity, confirm that SourceDataset is selected for the Source Dataset field.

  17. 对于“使用查询”字段,请选择“查询”, 然后输入以下查询:仅从 data_source_table 中选择 LastModifytime 的最大值。Select Query for the Use Query field, and enter the following query: you are only selecting the maximum value of LastModifytime from the data_source_table. 请确保还选中了“仅第一行”。Please make sure you have also checked First row only.

    select MAX(LastModifytime) as NewWatermarkvalue from data_source_table
    

    第二个查找活动 - 查询

  18. 在“活动”工具箱中,展开“移动和转换”,然后从“活动”工具箱拖放“复制”活动,并将名称设置为“IncrementalCopyActivity”。In the Activities toolbox, expand Move & Transform, and drag-drop the Copy activity from the Activities toolbox, and set the name to IncrementalCopyActivity.

  19. 通过将附加到“查找”活动的 绿色按钮 拖至“复制”活动,将两个“查找”活动都连接到“复制”活动Connect both Lookup activities to the Copy activity by dragging the green button attached to the Lookup activities to the Copy activity. 看到“复制”活动的边框颜色变为蓝色时,松开鼠标按键。Release the mouse button when you see the border color of the Copy activity changes to blue.

    将“查找”活动连接到“复制”活动

  20. 选择 “复制”活动,确认在“属性”窗口看到活动的属性。Select the Copy activity and confirm that you see the properties for the activity in the Properties window.

  21. 在“属性”窗口中切换到“源”选项卡,然后执行以下步骤:Switch to the Source tab in the Properties window, and do the following steps:

    1. 对于“源数据集”字段,请选择“SourceDataset”。Select SourceDataset for the Source Dataset field.

    2. 对于“使用查询”字段,请选择“查询”。Select Query for the Use Query field.

    3. 对于“查询”字段,请输入以下 SQL 查询。Enter the following SQL query for the Query field.

      select * from data_source_table where LastModifytime > '@{activity('LookupOldWaterMarkActivity').output.firstRow.WatermarkValue}' and LastModifytime <= '@{activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}'
      

      “复制”活动 - 源

  22. 切换到“接收器”选项卡。对于“接收器数据集”字段,请单击“+ 新建”。Switch to the Sink tab, and click + New for the Sink Dataset field.

  23. 在本教程中,接收器数据存储属于“Azure Blob 存储”类型。In this tutorial sink data store is of type Azure Blob Storage. 因此,请在“新建数据集”窗口中选择“Azure Blob 存储”,然后单击“继续”。Therefore, select Azure Blob Storage, and click Continue in the New Dataset window.

  24. 在“选择格式”窗口中选择数据的格式类型,然后单击“继续”。In the Select Format window, select the format type of your data, and click Continue.

  25. 在“设置属性”窗口中,对于“名称”输入“SinkDataset” 。In the Set Properties window, enter SinkDataset for Name. 对于“链接服务”,选择“+新建”。For Linked Service, select + New. 此步骤创建一个连接(链接服务),用于连接到 Azure Blob 存储In this step, you create a connection (linked service) to your Azure Blob storage.

  26. 在“新建链接服务(Azure Blob 存储)”窗口中执行以下步骤:In the New Linked Service (Azure Blob Storage) window, do the following steps:

    1. 输入 AzureStorageLinkedService 作为 名称Enter AzureStorageLinkedService for Name.
    2. 对于“存储帐户名称”,请选择自己的 Azure 存储帐户。Select your Azure Storage account for Storage account name.
    3. 测试连接,然后单击“完成”。Test Connection and then click Finish.
  27. 在“设置属性”窗口中,对于“链接服务”,确认选择了“AzureStorageLinkedService”。In the Set Properties window, confirm that AzureStorageLinkedService is selected for Linked service. 然后选择“完成”。Then select Finish.

  28. 转到 SinkDataset 的“连接”选项卡,然后执行以下步骤:Go to the Connection tab of SinkDataset and do the following steps:

    1. 对于“文件路径”字段,请输入“adftutorial/incrementalcopy”。For the File path field, enter adftutorial/incrementalcopy. adftutorial 是 Blob 容器名称,incrementalcopy 是文件夹名称。adftutorial is the blob container name and incrementalcopy is the folder name. 此代码片段假设 Blob 存储中有一个名为 adftutorial 的 Blob 容器。This snippet assumes that you have a blob container named adftutorial in your blob storage. 创建容器(如果不存在),或者将容器设置为现有容器的名称。Create the container if it doesn't exist, or set it to the name of an existing one. Azure 数据工厂自动创建输出文件夹 incrementalcopy(如果不存在)。Azure Data Factory automatically creates the output folder incrementalcopy if it does not exist. 对于“文件路径”,也可使用“浏览”按钮导航到 Blob 容器中的某个文件夹。 You can also use the Browse button for the File path to navigate to a folder in a blob container.
    2. 对于“文件路径”字段的“文件”部分,选择“添加动态内容 [Alt+P]”,然后在打开的窗口中输入 @CONCAT('Incremental-', pipeline().RunId, '.txt')For the File part of the File path field, select Add dynamic content [Alt+P], and then enter @CONCAT('Incremental-', pipeline().RunId, '.txt')in the opened window. 然后选择“完成”。Then select Finish. 文件名是使用表达式动态生成的。The file name is dynamically generated by using the expression. 每次管道运行都有唯一的 ID。Each pipeline run has a unique ID. “复制”活动使用运行 ID 生成文件名。The Copy activity uses the run ID to generate the file name.
  29. 通过单击顶部的管道选项卡,或者单击左侧树状视图中管道的名称,切换到 管道 编辑器。Switch to the pipeline editor by clicking the pipeline tab at the top or by clicking the name of the pipeline in the tree view on the left.

  30. 在“活动”工具箱中,展开“常规”,然后将 存储过程 活动从“活动”工具箱拖放到管道设计器图面。In the Activities toolbox, expand General, and drag-drop the Stored Procedure activity from the Activities toolbox to the pipeline designer surface. 复制 活动的绿色(成功)输出 连接存储过程 活动。Connect the green (Success) output of the Copy activity to the Stored Procedure activity.

  31. 在管道设计器中选择“存储过程活动”,将其名称更改为 StoredProceduretoWriteWatermarkActivitySelect Stored Procedure Activity in the pipeline designer, change its name to StoredProceduretoWriteWatermarkActivity.

  32. 切换到“SQL 帐户”选项卡,对于“链接服务”,请选择“AzureSqlDatabaseLinkedService”。Switch to the SQL Account tab, and select AzureSqlDatabaseLinkedService for Linked service.

  33. 切换到“存储过程”选项卡,然后执行以下步骤:Switch to the Stored Procedure tab, and do the following steps:

    1. 对于“存储过程名称”,请选择 usp_write_watermarkFor Stored procedure name, select usp_write_watermark.

    2. 若要指定存储过程参数的值,请单击“导入参数”,然后为参数输入以下值:To specify values for the stored procedure parameters, click Import parameter, and enter following values for the parameters:

      名称Name 类型Type Value
      LastModifiedtimeLastModifiedtime DateTimeDateTime @{activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}@{activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}
      TableNameTableName 字符串String @{activity('LookupOldWaterMarkActivity').output.firstRow.TableName}@{activity('LookupOldWaterMarkActivity').output.firstRow.TableName}

      存储过程活动 - 存储过程设置

  34. 若要验证管道设置,请单击工具栏中的“验证”。To validate the pipeline settings, click Validate on the toolbar. 确认没有任何验证错误。Confirm that there are no validation errors. 若要关闭“管道验证报告”窗口,请单击 >>。To close the Pipeline Validation Report window, click >>.

  35. 选择“全部发布”按钮,将实体(链接服务、数据集和管道)发布到 Azure 数据工厂服务。Publish entities (linked services, datasets, and pipelines) to the Azure Data Factory service by selecting the Publish All button. 等待“发布成功”消息出现。Wait until you see a message that the publishing succeeded.

触发管道运行Trigger a pipeline run

  1. 单击工具栏中的“添加触发器”,然后单击“立即触发”。 Click Add Trigger on the toolbar, and click Trigger Now.

  2. 在“管道运行”窗口中选择“完成”。 In the Pipeline Run window, select Finish.

监视管道运行Monitor the pipeline run

  1. 在左侧切换到“监视”选项卡。Switch to the Monitor tab on the left. 可看到手动触发器触发的管道运行的状态。You see the status of the pipeline run triggered by a manual trigger. 可使用“管道名称”列下的链接来查看运行详细信息并重新运行该管道。You can use links under the PIPELINE NAME column to view run details and to rerun the pipeline.

  2. 若要查看与管道运行关联的活动运行,请选择“管道名称”列下的链接。To see activity runs associated with the pipeline run, select the link under the PIPELINE NAME column. 有关活动运行的详细信息,请选择“活动名称”列下的“详细信息”链接(眼镜图标) 。For details about the activity runs, select the Details link (eyeglasses icon) under the ACTIVITY NAME column. 选择顶部的“所有管道运行”,回到“管道运行”视图。Select All pipeline runs at the top to go back to the Pipeline Runs view. 若要刷新视图,请选择“刷新”。To refresh the view, select Refresh.

查看结果Review the results

  1. 使用 Azure 存储资源管理器之类的工具连接到 Azure 存储帐户。Connect to your Azure Storage Account by using tools such as Azure Storage Explorer. 验证 adftutorial 容器的 incrementalcopy 文件夹中是否创建了一个输出文件。Verify that an output file is created in the incrementalcopy folder of the adftutorial container.

    第一个输出文件

  2. 打开输出文件,请注意,所有数据已从 data_source_table 复制到 Blob 文件。Open the output file and notice that all the data is copied from the data_source_table to the blob file.

    1,aaaa,2017-09-01 00:56:00.0000000
    2,bbbb,2017-09-02 05:23:00.0000000
    3,cccc,2017-09-03 02:36:00.0000000
    4,dddd,2017-09-04 03:21:00.0000000
    5,eeee,2017-09-05 08:06:00.0000000
    
  3. watermarktable 中查看最新值。Check the latest value from watermarktable. 可看到水印值已更新。You see that the watermark value was updated.

    Select * from watermarktable
    

    输出如下:Here is the output:

    TableNameTableName WatermarkValueWatermarkValue
    data_source_tabledata_source_table 2017-09-05 8:06:00.0002017-09-05 8:06:00.000

向源添加更多数据Add more data to source

在数据库(数据源存储)中插入新数据。Insert new data into your database (data source store).

INSERT INTO data_source_table
VALUES (6, 'newdata','9/6/2017 2:23:00 AM')

INSERT INTO data_source_table
VALUES (7, 'newdata','9/7/2017 9:01:00 AM')

数据库中更新后的数据为:The updated data in the your database is:

PersonID | Name | LastModifytime
-------- | ---- | --------------
1 | aaaa | 2017-09-01 00:56:00.000
2 | bbbb | 2017-09-02 05:23:00.000
3 | cccc | 2017-09-03 02:36:00.000
4 | dddd | 2017-09-04 03:21:00.000
5 | eeee | 2017-09-05 08:06:00.000
6 | newdata | 2017-09-06 02:23:00.000
7 | newdata | 2017-09-07 09:01:00.000

触发另一管道运行Trigger another pipeline run

  1. 切换到“编辑”选项卡。单击树状视图中的管道(如果未在设计器中打开)。Switch to the Edit tab. Click the pipeline in the tree view if it's not opened in the designer.

  2. 单击工具栏中的“添加触发器”,然后单击“立即触发”。Click Add Trigger on the toolbar, and click Trigger Now.

监视第二个管道运行Monitor the second pipeline run

  1. 在左侧切换到“监视”选项卡。Switch to the Monitor tab on the left. 可看到手动触发器触发的管道运行的状态。You see the status of the pipeline run triggered by a manual trigger. 可以使用“管道名称”列下的链接来查看活动详细信息以及重新运行该管道。You can use links under the PIPELINE NAME column to view activity details and to rerun the pipeline.

  2. 若要查看与管道运行关联的活动运行,请选择“管道名称”列下的链接。To see activity runs associated with the pipeline run, select the link under the PIPELINE NAME column. 有关活动运行的详细信息,请选择“活动名称”列下的“详细信息”链接(眼镜图标) 。For details about the activity runs, select the Details link (eyeglasses icon) under the ACTIVITY NAME column. 选择顶部的“所有管道运行”,回到“管道运行”视图。Select All pipeline runs at the top to go back to the Pipeline Runs view. 若要刷新视图,请选择“刷新”。To refresh the view, select Refresh.

验证第二个输出Verify the second output

  1. 在 Blob 存储中,可以看到另一文件已创建。In the blob storage, you see that another file was created. 在本教程中,新文件名为 Incremental-<GUID>.txtIn this tutorial, the new file name is Incremental-<GUID>.txt. 打开该文件,会看到其中包含两行记录。Open that file, and you see two rows of records in it.

    6,newdata,2017-09-06 02:23:00.0000000
    7,newdata,2017-09-07 09:01:00.0000000    
    
  2. watermarktable 中查看最新值。Check the latest value from watermarktable. 可看到水印值已再次更新。You see that the watermark value was updated again.

    Select * from watermarktable
    

    示例输出:sample output:

    TableNameTableName WatermarkValueWatermarkValue
    data_source_tabledata_source_table 2017-09-07 09:01:00.0002017-09-07 09:01:00.000

后续步骤Next steps

已在本教程中执行了以下步骤:You performed the following steps in this tutorial:

  • 准备用于存储水印值的数据存储。Prepare the data store to store the watermark value.
  • 创建数据工厂。Create a data factory.
  • 创建链接服务。Create linked services.
  • 创建源、接收器和水印数据集。Create source, sink, and watermark datasets.
  • 创建管道。Create a pipeline.
  • 运行管道。Run the pipeline.
  • 监视管道运行。Monitor the pipeline run.
  • 查看结果Review results
  • 向源添加更多数据。Add more data to the source.
  • 再次运行管道。Run the pipeline again.
  • 监视第二个管道运行Monitor the second pipeline run
  • 查看第二个运行的结果Review results from the second run

在本教程中,管道将数据从 SQL 数据库中的单个表复制到了 Blob 存储。In this tutorial, the pipeline copied data from a single table in SQL Database to Blob storage. 转到下面的教程,了解如何将数据从 SQL Server 数据库中的多个表复制到 SQL 数据库。Advance to the following tutorial to learn how to copy data from multiple tables in a SQL Server database to SQL Database.