在 Azure 门户中使用 Azure 数据工厂批量复制多个表Copy multiple tables in bulk by using Azure Data Factory in the Azure portal

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

本教程演示如何将 Azure SQL 数据库中的多个表复制到 Azure Synapse Analytics(前称为 SQL 数据仓库)This tutorial demonstrates copying a number of tables from Azure SQL Database to Azure Synapse Analytics (formerly SQL DW). 在其他复制方案中,也可以应用相同的模式。You can apply the same pattern in other copy scenarios as well. 例如,将 SQL Server/Oracle 中的表复制到 Azure SQL 数据库/Azure Synapse Analytics(前称为 SQL 数据仓库)/Azure Blob,将 Blob 中的不同路径复制到 Azure SQL 数据库表。For example, copying tables from SQL Server/Oracle to Azure SQL Database/Azure Synapse Analytics (formerly SQL DW)/Azure Blob, copying different paths from Blob to Azure SQL Database tables.

备注

从较高层面讲,本教程涉及以下步骤:At a high level, this tutorial involves following steps:

  • 创建数据工厂。Create a data factory.
  • 创建 Azure SQL 数据库、Azure Synapse Analytics(前称为 SQL 数据仓库)和 Azure 存储链接服务。Create Azure SQL Database, Azure Synapse Analytics (formerly SQL DW), and Azure Storage linked services.
  • 创建 Azure SQL 数据库和 Azure Synapse Analytics(前称为 SQL 数据仓库)数据集。Create Azure SQL Database and Azure Synapse Analytics (formerly SQL DW) datasets.
  • 创建一个管道用于查找要复制的表,创建另一个管道用于执行实际复制操作。Create a pipeline to look up the tables to be copied and another pipeline to perform the actual copy operation.
  • 启动管道运行。Start a pipeline run.
  • 监视管道和活动运行。Monitor the pipeline and activity runs.

本教程使用 Azure 门户。This tutorial uses Azure portal. 若要了解如何使用其他工具/SDK 创建数据工厂,请参阅快速入门To learn about using other tools/SDKs to create a data factory, see Quickstarts.

端到端工作流End-to-end workflow

在本方案中,Azure SQL 数据库中包含一些需要复制到 Azure Synapse Analytics(前称为 SQL 数据仓库)的表。In this scenario, you have a number of tables in Azure SQL Database that you want to copy to Azure Synapse Analytics (formerly SQL DW). 下面是管道中发生的工作流中的逻辑步骤顺序:Here is the logical sequence of steps in the workflow that happens in pipelines:

工作流

  • 第一个管道查找需要复制到接收器数据存储的表列表。The first pipeline looks up the list of tables that needs to be copied over to the sink data stores. 也可以维护一个元数据表用于列出要复制到接收器数据存储的所有表。Alternatively you can maintain a metadata table that lists all the tables to be copied to the sink data store. 然后,该管道触发另一个管道,后者循环访问数据库中的每个表并执行数据复制操作。Then, the pipeline triggers another pipeline, which iterates over each table in the database and performs the data copy operation.
  • 第二个管道执行实际复制。The second pipeline performs the actual copy. 它使用表列表作为参数。It takes the list of tables as a parameter. 对于列表中的每个表,为获得最佳性能,会使用通过 Blob 存储和 PolyBase 进行的分阶段复制,将 Azure SQL 数据库中的特定表复制到 Azure Synapse Analytics(前称为 SQL 数据仓库)中的相应表。For each table in the list, copy the specific table in Azure SQL Database to the corresponding table in Azure Synapse Analytics (formerly SQL DW) using staged copy via Blob storage and PolyBase for best performance. 在本示例中,第一个管道传递表列表作为参数值。In this example, the first pipeline passes the list of tables as a value for the parameter.

如果没有 Azure 订阅,可在开始前创建一个 1 元人民币试用帐户。If you don't have an Azure subscription, create a 1rmb trial account before you begin.

先决条件Prerequisites

  • Azure 存储帐户Azure Storage account. Azure 存储帐户用作批量复制操作中的过渡 Blob 存储。The Azure Storage account is used as staging blob storage in the bulk copy operation.
  • Azure SQL 数据库Azure SQL Database. 此数据库包含源数据。This database contains the source data.
  • Azure Synapse Analytics(前称为 SQL 数据仓库)Azure Synapse Analytics (formerly SQL DW). 此数据仓库包含从 SQL 数据库复制的数据。This data warehouse holds the data copied over from the SQL Database.

准备 SQL 数据库和 Azure Synapse Analytics(前称为 SQL 数据仓库)Prepare SQL Database and Azure Synapse Analytics (formerly SQL DW)

准备源 Azure SQL 数据库Prepare the source Azure SQL Database:

按照在 Azure SQL 数据库中创建数据库一文,使用 Adventure Works LT 示例数据在 SQL 数据库中创建一个数据库。Create a database in SQL Database with Adventure Works LT sample data following Create a database in Azure SQL Database article. 本教程将此示例数据库中的所有表复制到 Azure Synapse Analytics(前称为 SQL 数据仓库)。This tutorial copies all the tables from this sample database to an Azure Synapse Analytics (formerly SQL DW).

准备接收器 Azure Synapse Analytics(前称为 SQL 数据仓库)Prepare the sink Azure Synapse Analytics (formerly SQL DW):

  1. 如果没有 Azure Synapse Analytics(以前称为 SQL 数据仓库)工作区,请参阅 Azure Synapse Analytics 入门一文了解创建步骤。If you don't have an Azure Synapse Analytics (formerly SQL DW) workspace, see the Get started with Azure Synapse Analytics article for steps to create one.

  2. 在 Azure Synapse Analytics(前称为 SQL 数据仓库)中创建相应的表架构。Create corresponding table schemas in Azure Synapse Analytics (formerly SQL DW). 后面的步骤使用 Azure 数据工厂迁移/复制数据。You use Azure Data Factory to migrate/copy data in a later step.

Azure 服务访问 SQL 服务器Azure services to access SQL server

对于 SQL 数据库和 Azure Synapse Analytics(前称为 SQL 数据仓库),请允许 Azure 服务访问 SQL 服务器。For both SQL Database and Azure Synapse Analytics (formerly SQL DW), allow Azure services to access SQL server. 确保服务器的“允许 Azure 服务和资源访问此服务器”设置为“打开”状态 。Ensure that Allow Azure services and resources to access this server setting is turned ON for your server. 此设置允许数据工厂服务从 Azure SQL 数据库中读取数据,并将数据写入 Azure Synapse Analytics(前称为 SQL 数据仓库)。This setting allows the Data Factory service to read data from your Azure SQL Database and write data to your Azure Synapse Analytics (formerly SQL DW).

若要验证并启用此设置,请转到服务器 >“安全性”>“防火墙和虚拟网络”> 将“允许 Azure 服务和资源访问此服务器”设置为“打开” 。To verify and turn on this setting, go to your server > Security > Firewalls and virtual networks > set the Allow Azure services and resources to access this server to ON.

创建数据工厂Create a data factory

  1. 启动 Microsoft EdgeGoogle Chrome Web 浏览器。Launch Microsoft Edge or Google Chrome web browser. 目前,仅 Microsoft Edge 和 Google Chrome Web 浏览器支持数据工厂 UI。Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.

  2. 转到 Azure 门户Go to the Azure portal.

  3. 在 Azure 门户菜单的左侧,选择“创建资源”>“数据 + 分析”>“数据工厂” :在“新建”窗格中选择“数据工厂”On the left of the Azure portal menu, select Create a resource > Data + Analytics > Data Factory: Data Factory selection in the "New" pane

  4. 在“新建数据工厂”页上,输入 ADFTutorialBulkCopyDF 作为名称On the New data factory page, enter ADFTutorialBulkCopyDF for name.

    Azure 数据工厂的名称必须 全局唯一The name of the Azure data factory must be globally unique. 如果看到名称字段的以下错误,请更改数据工厂的名称(例如,改为 yournameADFTutorialBulkCopyDF)。If you see the following error for the name field, change the name of the data factory (for example, yournameADFTutorialBulkCopyDF). 有关数据工厂项目命名规则,请参阅数据工厂 - 命名规则一文。See Data Factory - Naming Rules article for naming rules for Data Factory artifacts.

    Data factory name "ADFTutorialBulkCopyDF" is not available
    
  5. 选择要在其中创建数据工厂的 Azure 订阅Select your Azure subscription in which you want to create the data factory.

  6. 对于资源组,请执行以下步骤之一:For the Resource Group, do one of the following steps:

    • 选择“使用现有资源组”,并从下拉列表选择现有的资源组。 Select Use existing, and select an existing resource group from the drop-down list.

    • 选择“新建”,并输入资源组的名称。Select Create new, and enter the name of a resource group.

      若要了解有关资源组的详细信息,请参阅 使用资源组管理 Azure 资源To learn about resource groups, see Using resource groups to manage your Azure resources.

  7. 选择“V2”作为“版本”。Select V2 for the version.

  8. 选择数据工厂的位置Select the location for the data factory. 若要查看目前提供数据工厂的 Azure 区域的列表,请在以下页面上选择感兴趣的区域,然后展开“分析”以找到“数据工厂”:可用产品(按区域)For a list of Azure regions in which Data Factory is currently available, select the regions that interest you on the following page, and then expand Analytics to locate Data Factory: Products available by region. 数据工厂使用的数据存储(Azure 存储、Azure SQL 数据库,等等)和计算资源(HDInsight 等)可以位于其他区域中。The data stores (Azure Storage, Azure SQL Database, etc.) and computes (HDInsight, etc.) used by data factory can be in other regions.

  9. 单击“创建”。Click Create.

  10. 创建完成后,选择“转到资源”导航到“数据工厂”页。After the creation is complete, select Go to resource to navigate to the Data Factory page.

  11. 单击“创作和监视”磁贴,在单独的选项卡中启动数据工厂 UI 应用程序。Click Author & Monitor tile to launch the Data Factory UI application in a separate tab.

  12. 在“让我们开始吧”页上,切换到左侧面板中的“创作”选项卡,如下图所示:On the Let's get started page, switch to the Author tab in the left panel as shown in the following image:

    “入门”页

创建链接服务Create linked services

创建链接服务,将数据存储和计算链接到数据工厂。You create linked services to link your data stores and computes to a data factory. 链接服务包含的连接信息可供数据工厂服务用来在运行时连接到数据存储。A linked service has the connection information that the Data Factory service uses to connect to the data store at runtime.

在本教程中,请将 Azure SQL 数据库、Azure Synapse Analytics(前称为 SQL 数据仓库)和 Azure Blob 存储数据存储链接到数据工厂。In this tutorial, you link your Azure SQL Database, Azure Synapse Analytics (formerly SQL DW), and Azure Blob Storage data stores to your data factory. Azure SQL 数据库是源数据存储。The Azure SQL Database is the source data store. Azure Synapse Analytics(前称为 SQL 数据仓库)是接收器/目标数据存储。The Azure Synapse Analytics (formerly SQL DW) is the sink/destination data store. 在使用 PolyBase 将数据载入 Azure Synapse Analytics(前称为 SQL 数据仓库)之前,Azure Blob 存储用于暂存数据。The Azure Blob Storage is to stage the data before the data is loaded into Azure Synapse Analytics (formerly SQL DW) by using PolyBase.

创建源 Azure SQL 数据库链接服务Create the source Azure SQL Database linked service

在此步骤中,请创建一个链接服务,将 Azure SQL 数据库中的数据库链接到数据工厂。In this step, you create a linked service to link your database in Azure SQL Database to the data factory.

  1. 从左窗格打开“管理”选项卡Open Manage tab from the left pane.

  2. 在“链接服务”页上,选择“+ 新建”以创建新的链接服务。On the Linked services page, select +New to create a new linked service.

    新建链接服务

  3. 在“新建链接服务”窗口中,选择“Azure SQL 数据库”,然后单击“继续”。In the New Linked Service window, select Azure SQL Database, and click Continue.

  4. 在“新建链接服务(Azure SQL 数据库)”窗口中执行以下步骤:In the New Linked Service (Azure SQL Database) window, do the following steps:

    a.a. 对于“名称”,请输入 AzureSqlDatabaseLinkedServiceEnter AzureSqlDatabaseLinkedService for Name.

    b.b. 对于“服务器名称”,请选择你的服务器Select your server for Server name

    c.c. 对于“数据库名称”,请选择你的数据库。Select your database for Database name.

    d.d. 输入要连接到数据库的用户的姓名。Enter name of the user to connect to your database.

    e.e. 输入对应于该用户的密码Enter password for the user.

    f.f. 若要使用指定的信息测试到数据库的连接,请单击“测试连接”。To test the connection to your database using the specified information, click Test connection.

    g.g. 单击“创建”以保存链接服务。Click Create to save the linked service.

创建接收器 Azure Synapse Analytics(前称为 SQL 数据仓库)链接服务Create the sink Azure Synapse Analytics (formerly SQL DW) linked service

  1. 在“连接”选项卡中,再次单击工具栏中的“+ 新建”。 In the Connections tab, click + New on the toolbar again.

  2. 在“新建链接服务”窗口中,选择“Azure Synapse Analytics (前称为 SQL 数据仓库)”,然后单击“继续”。In the New Linked Service window, select Azure Synapse Analytics (formerly SQL DW), and click Continue.

  3. 在“新建链接服务(Azure Synapse Analytics (前称为 SQL 数据仓库))”窗口中执行以下步骤:In the New Linked Service (Azure Synapse Analytics (formerly SQL DW)) window, do the following steps:

    a.a. 对于“名称”,请输入 AzureSqlDWLinkedServiceEnter AzureSqlDWLinkedService for Name.

    b.b. 对于“服务器名称”,请选择你的服务器Select your server for Server name

    c.c. 对于“数据库名称”,请选择你的数据库。Select your database for Database name.

    d.d. 输入用于连接到你的数据库的用户名。Enter User name to connect to your database.

    e.e. 输入该用户的密码Enter Password for the user.

    f.f. 若要使用指定的信息测试到数据库的连接,请单击“测试连接”。To test the connection to your database using the specified information, click Test connection.

    g.g. 单击“创建”。Click Create.

创建过渡 Azure 存储链接服务Create the staging Azure Storage linked service

本教程使用 Azure Blob 存储作为临时过渡区域,以利用 PolyBase 来实现更好的复制性能。In this tutorial, you use Azure Blob storage as an interim staging area to enable PolyBase for a better copy performance.

  1. 在“连接”选项卡中,再次单击工具栏中的“+ 新建”。In the Connections tab, click + New on the toolbar again.

  2. 在“新建链接服务”窗口中,选择“Azure Blob 存储”,然后单击“继续”。In the New Linked Service window, select Azure Blob Storage, and click Continue.

  3. 在“新建链接服务(Azure Blob 存储)”窗口中执行以下步骤:In the New Linked Service (Azure Blob Storage) window, do the following steps:

    a.a. 输入 AzureStorageLinkedService 作为名称Enter AzureStorageLinkedService for Name.
    b.b. 对于“存储帐户名称”,请选择 Azure 存储帐户Select your Azure Storage account for Storage account name.

    c.c. 单击“创建”。Click Create.

创建数据集Create datasets

在本教程中创建源和接收器数据集,用于指定数据的存储位置。In this tutorial, you create source and sink datasets, which specify the location where the data is stored.

输入数据集 AzureSqlDatabaseDataset 是指 AzureSqlDatabaseLinkedServiceThe input dataset AzureSqlDatabaseDataset refers to the AzureSqlDatabaseLinkedService. 链接服务指定用于连接到数据库的连接字符串。The linked service specifies the connection string to connect to the database. 数据集指定数据库的名称以及包含源数据的表。The dataset specifies the name of the database and the table that contains the source data.

输出数据集 AzureSqlDWDataset 是指 AzureSqlDWLinkedServiceThe output dataset AzureSqlDWDataset refers to the AzureSqlDWLinkedService. 链接服务指定用于连接 Azure Synapse Analytics(前称为 SQL 数据仓库)的连接字符串。The linked service specifies the connection string to connect to the Azure Synapse Analytics (formerly SQL DW). 数据集指定数据库以及要向其复制数据的表。The dataset specifies the database and the table to which the data is copied.

在本教程中,源和目标 SQL 表未在数据集定义中硬编码,In this tutorial, the source and destination SQL tables are not hard-coded in the dataset definitions. 而是通过 ForEach 活动在运行时将表的名称传递到“复制”活动。Instead, the ForEach activity passes the name of the table at runtime to the Copy activity.

为源 SQL 数据库创建数据集Create a dataset for source SQL Database

  1. 单击左窗格中的“+ (加号)”,然后单击“数据集”。Click + (plus) in the left pane, and then click Dataset.

    “新建数据集”菜单

  2. 在“新建数据集”窗口中,选择“Azure SQL 数据库”,然后单击“继续”。In the New Dataset window, select Azure SQL Database, and then click Continue.

  3. 在“设置属性”窗口的“名称”下,输入 AzureSqlDatabaseDatasetIn the Set properties window, under Name, enter AzureSqlDatabaseDataset. 在“链接服务”下选择“AzureSqlDatabaseLinkedService”。Under Linked service, select AzureSqlDatabaseLinkedService. Then click OK.

  4. 切换到“连接”选项卡,然后选择任何表作为“表”。Switch to the Connection tab, select any table for Table. 此表是一个虚拟表。This table is a dummy table. 在创建管道时指定一个针对源数据集的查询。You specify a query on the source dataset when creating a pipeline. 该查询用于从数据库提取数据。The query is used to extract data from your database. 也可以单击“编辑”复选框,然后输入 dbo.dummyName 作为表名称。Alternatively, you can click Edit check box, and enter dbo.dummyName as the table name.

为接收器 Azure Synapse Analytics(前称为 SQL 数据仓库)创建数据集Create a dataset for sink Azure Synapse Analytics (formerly SQL DW)

  1. 单击左窗格中的“+ (加)”,然后单击“数据集”。Click + (plus) in the left pane, and click Dataset.

  2. 在“新建数据集”窗口中,选择“Azure Synapse Analytics (前称为 SQL 数据仓库)”,然后单击“继续”。In the New Dataset window, select Azure Synapse Analytics (formerly SQL DW), and then click Continue.

  3. 在“设置属性”窗口的“名称”下,输入 AzureSqlDWDatasetIn the Set properties window, under Name, enter AzureSqlDWDataset. 在“链接服务”下选择“AzureSqlDWLinkedService”。Under Linked service, select AzureSqlDWLinkedService. Then click OK.

  4. 切换到“参数”选项卡,单击“+ 新建”,并输入 DWTableName 作为参数名称。Switch to the Parameters tab, click + New, and enter DWTableName for the parameter name. 再次单击“+新建”,然后输入 DWSchema 作为参数名称 。Click + New again, and enter DWSchema for the parameter name. 如果从页面中复制/粘贴此名称,请确保 DWTableName 和 DWSchema 末尾没有尾随空格字符 。If you copy/paste this name from the page, ensure that there's no trailing space character at the end of DWTableName and DWSchema.

  5. 切换到“连接”选项卡。Switch to the Connection tab,

    1. 对于“表”,请选中“编辑”选项。For Table, check the Edit option. 选择进入第一个输入框,单击下面的“添加动态内容”链接。Select into the first input box and click the Add dynamic content link below. 在“添加动态内容”页面中,单击“参数”下面的 DWSchema,这将自动填充顶部的表达式文本框 @dataset().DWSchema,然后单击“完成” 。In the Add Dynamic Content page, click the DWSchema under Parameters, which will automatically populate the top expression text box @dataset().DWSchema, and then click Finish.

      数据集连接表名称

    2. 选择进入第二个输入框,单击下面的“添加动态内容”链接。Select into the second input box and click the Add dynamic content link below. 在“添加动态内容”页面中,单击“参数”下面的 DWTAbleName,这将自动填充顶部的表达式文本框 @dataset().DWTableName,然后单击“完成” 。In the Add Dynamic Content page, click the DWTAbleName under Parameters, which will automatically populate the top expression text box @dataset().DWTableName, and then click Finish.

    3. 数据集的 tableName 属性设置为一个特定值,该值作为 DWSchema 和 DWTableName 参数的自变量传递 。The tableName property of the dataset is set to the values that are passed as arguments for the DWSchema and DWTableName parameters. ForEach 活动循环访问一系列表,然后将这些表逐个传递到“复制”活动。The ForEach activity iterates through a list of tables, and passes one by one to the Copy activity.

创建管道Create pipelines

在本教程中创建两个管道:IterateAndCopySQLTablesGetTableListAndTriggerCopyDataIn this tutorial, you create two pipelines: IterateAndCopySQLTables and GetTableListAndTriggerCopyData.

GetTableListAndTriggerCopyData 管道执行两项操作:The GetTableListAndTriggerCopyData pipeline performs two actions:

  • 查找 Azure SQL 数据库系统表,以获取要复制的表列表。Looks up the Azure SQL Database system table to get the list of tables to be copied.
  • 触发管道 IterateAndCopySQLTables 来执行实际数据复制。Triggers the pipeline IterateAndCopySQLTables to do the actual data copy.

IterateAndCopySQLTables 管道接受表列表作为参数。The IterateAndCopySQLTables pipeline takes a list of tables as a parameter. 对于列表中的每个表,此管道会使用分阶段复制和 PolyBase,将 Azure SQL 数据库中的表的数据复制到 Azure Synapse Analytics(前称为 SQL 数据仓库)。For each table in the list, it copies data from the table in Azure SQL Database to Azure Synapse Analytics (formerly SQL DW) using staged copy and PolyBase.

创建管道 IterateAndCopySQLTablesCreate the pipeline IterateAndCopySQLTables

  1. 在左窗格中单击“+ (加)”,然后单击“管道”。 In the left pane, click + (plus), and click Pipeline.

    “新建管道”菜单

  2. 在“常规”面板中的“属性”下,指定 IterateAndCopySQLTables 作为名称 。In the General panel under Properties, specify IterateAndCopySQLTables for Name. 然后通过单击右上角的“属性”图标来折叠面板。Then collapse the panel by clicking the Properties icon in the top-right corner.

  3. 切换到“参数”选项卡,然后执行以下操作:Switch to the Parameters tab, and do the following actions:

    a.a. 单击“+ 新建”。Click + New.

    b.b. 输入 tableList 作为参数名称Enter tableList for the parameter Name.

    c.c. 选择“数组”作为类型Select Array for Type.

  4. 在“活动”工具栏中展开“迭代和条件”,然后将 ForEach 活动拖放到管道设计图面。In the Activities toolbox, expand Iteration & Conditions, and drag-drop the ForEach activity to the pipeline design surface. 也可在“活动”工具箱中搜索活动。You can also search for activities in the Activities toolbox.

    a.a. 在底部的“常规”选项卡中,输入 IterateSQLTables 作为名称In the General tab at the bottom, enter IterateSQLTables for Name.

    b.b. 切换到“设置”选项卡,单击“项”的输入框,然后单击下面的“添加动态内容”链接。Switch to the Settings tab, click the input box for Items, then click the Add dynamic content link below.

    c.c. 在“添加动态内容”页面中,折叠“系统变量”和“函数”部分,单击“参数”下的 tableList,这会将顶部的表达式文本框自动填充为 @pipeline().parameter.tableListIn the Add Dynamic Content page, collapse the System Variables and Functions sections, click the tableList under Parameters, which will automatically populate the top expression text box as @pipeline().parameter.tableList. 然后单击“完成”。Then click Finish.

    Foreach 参数生成器

    d.d. 切换到“活动”选项卡,单击铅笔图标ForEach 活动添加子活动。Switch to Activities tab, click the pencil icon to add a child activity to the ForEach activity. Foreach 活动生成器Foreach activity builder

  5. 在“活动”工具箱中,展开“移动并传输”,将“复制数据”活动拖放到管道设计器图面中。In the Activities toolbox, expand Move & Transfer, and drag-drop Copy data activity into the pipeline designer surface. 请注意顶部的痕迹导航菜单。Notice the breadcrumb menu at the top. IterateAndCopySQLTable 是管道名称,IterateSQLTables 是 ForEach 活动名称。The IterateAndCopySQLTable is the pipeline name and IterateSQLTables is the ForEach activity name. 设计器处于活动范围内。The designer is in the activity scope. 若要从 ForEach 编辑器切换回管道编辑器,可单击痕迹导航菜单中的链接。To switch back to the pipeline editor from the ForEach editor, you can click the link in the breadcrumb menu.

    在 ForEach 中复制

  6. 切换到“源”选项卡,然后执行以下步骤:Switch to the Source tab, and do the following steps:

    1. 选择 AzureSqlDatabaseDataset 作为源数据集Select AzureSqlDatabaseDataset for Source Dataset.

    2. 为“使用查询”选择“查询”选项。Select Query option for Use query.

    3. 单击“查询”输入框,选择下方的“添加动态内容”,为查询输入以下表达式,然后选择“完成”。Click the Query input box -> select the Add dynamic content below -> enter the following expression for Query -> select Finish.

      SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]
      
  7. 切换到“接收器”选项卡,然后执行以下步骤:Switch to the Sink tab, and do the following steps:

    1. 选择 AzureSqlDWDataset 作为接收器数据集Select AzureSqlDWDataset for Sink Dataset.

    2. 单击 DWTableName 参数的“值”输入框,选择下方的“添加动态内容”,输入 @item().TABLE_NAME 表达式作为脚本,然后选择“完成”。Click the input box for the VALUE of DWTableName parameter -> select the Add dynamic content below, enter @item().TABLE_NAME expression as script, -> select Finish.

    3. 单击 DWSchema 参数的“值”输入框,选择下方的“添加动态内容”,输入 @item().TABLE_SCHEMA 表达式作为脚本,然后选择“完成” 。Click the input box for the VALUE of DWSchema parameter -> select the Add dynamic content below, enter @item().TABLE_SCHEMA expression as script, -> select Finish.

    4. 对于“复制方法”,请选择“PolyBase”。For Copy method, select PolyBase.

    5. 清除“使用类型默认值”选项。Clear the Use type default option.

    6. 单击“复制前脚本”输入框,选择下方的“添加动态内容”,输入以下表达式作为脚本,然后选择“完成”。Click the Pre-copy Script input box -> select the Add dynamic content below -> enter the following expression as script -> select Finish.

      TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]
      

      复制接收器设置

  8. 切换到“设置”选项卡,然后执行以下步骤:Switch to the Settings tab, and do the following steps:

    1. 选中“启用暂存”对应的复选框。Select the checkbox for Enable Staging.
    2. 选择 AzureStorageLinkedService 作为存储帐户链接服务Select AzureStorageLinkedService for Store Account Linked Service.
  9. 若要验证管道设置,请单击管道工具栏上的“验证”。To validate the pipeline settings, click Validate on the top pipeline tool bar. 确保没有任何验证错误。Make sure that there's no validation error. 若要关闭“管道验证报告”,请单击 >>To close the Pipeline Validation Report, click >>.

创建管道 GetTableListAndTriggerCopyDataCreate the pipeline GetTableListAndTriggerCopyData

此管道执行两项操作:This pipeline does two actions:

  • 查找 Azure SQL 数据库系统表,以获取要复制的表列表。Looks up the Azure SQL Database system table to get the list of tables to be copied.
  • 触发管道“IterateAndCopySQLTables”来执行实际数据复制。Triggers the pipeline "IterateAndCopySQLTables" to do the actual data copy.
  1. 在左窗格中单击“+ (加)”,然后单击“管道”。In the left pane, click + (plus), and click Pipeline.

  2. 在“常规”面板的“属性”下,将管道的名称更改为 GetTableListAndTriggerCopyData 。In the General panel under Properties, change the name of the pipeline to GetTableListAndTriggerCopyData.

  3. 在“活动”工具箱中展开“常规”, 将“查找”活动拖放到管道设计器图面,然后执行以下步骤:In the Activities toolbox, expand General, and drag-drop Lookup activity to the pipeline designer surface, and do the following steps:

    1. 输入 LookupTableList 作为名称Enter LookupTableList for Name.
    2. 输入“从数据库检索表格列表”作为说明 。Enter Retrieve the table list from my database for Description.
  4. 切换到“设置”选项卡,然后执行以下步骤:Switch to the Settings tab, and do the following steps:

    1. 选择 AzureSqlDatabaseDataset 作为源数据集Select AzureSqlDatabaseDataset for Source Dataset.

    2. 为“使用查询”选择“查询”。Select Query for Use query.

    3. 为“查询”输入以下 SQL 查询。Enter the following SQL query for Query.

      SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_SCHEMA = 'SalesLT' and TABLE_NAME <> 'ProductModel'
      
    4. 清除“仅第一行”字段对应的复选框。Clear the checkbox for the First row only field.

      查找活动 - 设置页

  5. 将“执行管道”活动从“活动”工具箱拖放到管道设计器图面,然后将名称设置为 TriggerCopyDrag-drop Execute Pipeline activity from the Activities toolbox to the pipeline designer surface, and set the name to TriggerCopy.

  6. 若要将“查找”活动连接到“执行管道”活动,请将附加到“查找”活动的绿框拖至“执行管道”活动左侧。To Connect the Lookup activity to the Execute Pipeline activity, drag the green box attached to the Lookup activity to the left of Execute Pipeline activity.

    连接“查找”和“执行管道”活动

  7. 切换到“执行管道”活动的“设置”选项卡,并执行以下步骤:Switch to the Settings tab of Execute Pipeline activity, and do the following steps:

    1. 选择 IterateAndCopySQLTables 作为调用的管道Select IterateAndCopySQLTables for Invoked pipeline.

    2. 清除“等待完成”复选框。Clear the checkbox for Wait on completion.

    3. 在“参数”部分,单击“值”下的输入框,选择下面的“添加动态内容”,输入 @activity('LookupTableList').output.value 作为表名值,然后选择“完成” 。In the Parameters section, click the input box under VALUE -> select the Add dynamic content below -> enter @activity('LookupTableList').output.value as table name value -> select Finish. 你是在将“查找”活动的结果列表设置为第二个管道的输入。You're setting the result list from the Lookup activity as an input to the second pipeline. 结果列表包含一系列表,这些表的数据需要复制到目标。The result list contains the list of tables whose data needs to be copied to the destination.

      “执行管道”活动 - 设置页

  8. 若要验证管道,请单击工具栏中的“验证”。To validate the pipeline, click Validate on the toolbar. 确认没有任何验证错误。Confirm that there are no validation errors. 若要关闭“管道验证报告”,请单击 >>To close the Pipeline Validation Report, click >>.

  9. 若要将实体(数据集、管道等)发布到数据工厂服务,请单击窗口顶部的“全部发布”。To publish entities (datasets, pipelines, etc.) to the Data Factory service, click Publish all on top of the window. 等待发布成功。Wait until the publishing succeeds.

触发管道运行Trigger a pipeline run

  1. 转到管道 GetTableListAndTriggerCopyData,单击顶部管道工具栏上的“添加触发器”,然后单击“立即触发”。Go to pipeline GetTableListAndTriggerCopyData, click Add Trigger on the top pipeline tool bar, and then click Trigger now.

  2. 确认“管道运行”页上的运行,然后选择“完成”。Confirm the run on the Pipeline run page, and then select Finish.

监视管道运行Monitor the pipeline run

  1. 切换到“监视”选项卡。单击“刷新”,直至看到解决方案中两个管道的运行。Switch to the Monitor tab. Click Refresh until you see runs for both the pipelines in your solution. 持续刷新列表,直至看到“成功”状态。Continue refreshing the list until you see the Succeeded status.

  2. 若要查看与 GetTableListAndTriggerCopyData 管道关联的活动运行,请单击该管道的管道名称链接。To view activity runs associated with the GetTableListAndTriggerCopyData pipeline, click the pipeline name link for the pipeline. 此时会看到该管道运行的两个活动运行。You should see two activity runs for this pipeline run. 监视管道运行Monitor Pipeline run

  3. 若要查看“查找”活动的输出,请单击“活动名称”列下该活动旁边的“输出”链接。To view the output of the Lookup activity, click the Output link next to the activity under the ACTIVITY NAME column. 可以最大化和还原“输出”窗口。You can maximize and restore the Output window. 查看后,单击“X”关闭“输出”窗口。After reviewing, click X to close the Output window.

    {
        "count": 9,
        "value": [
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "Customer"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "ProductDescription"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "Product"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "ProductModelProductDescription"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "ProductCategory"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "Address"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "CustomerAddress"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "SalesOrderDetail"
            },
            {
                "TABLE_SCHEMA": "SalesLT",
                "TABLE_NAME": "SalesOrderHeader"
            }
        ],
        "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (China East 2)",
        "effectiveIntegrationRuntimes": [
            {
                "name": "DefaultIntegrationRuntime",
                "type": "Managed",
                "location": "China East 2",
                "billedDuration": 0,
                "nodes": null
            }
        ]
    }
    
  4. 若要切换回到“管道运行”视图,请单击痕迹导航菜单顶部的“所有管道运行”链接。To switch back to the Pipeline Runs view, click All Pipeline runs link at the top of the breadcrumb menu. 单击“IterateAndCopySQLTables”链接(在“管道名称”列下)查看该管道的活动运行。Click IterateAndCopySQLTables link (under PIPELINE NAME column) to view activity runs of the pipeline. 请注意,每个表在“查找”活动输出中都有一个“复制”活动运行。Notice that there's one Copy activity run for each table in the Lookup activity output.

  5. 确认数据已复制到在本教程中使用的目标 Azure Synapse Analytics(前称为 SQL 数据仓库)。Confirm that the data was copied to the target Azure Synapse Analytics (formerly SQL DW) you used in this tutorial.

后续步骤Next steps

已在本教程中执行了以下步骤:You performed the following steps in this tutorial:

  • 创建数据工厂。Create a data factory.
  • 创建 Azure SQL 数据库、Azure Synapse Analytics(前称为 SQL 数据仓库)和 Azure 存储链接服务。Create Azure SQL Database, Azure Synapse Analytics (formerly SQL DW), and Azure Storage linked services.
  • 创建 Azure SQL 数据库和 Azure Synapse Analytics(前称为 SQL 数据仓库)数据集。Create Azure SQL Database and Azure Synapse Analytics (formerly SQL DW) datasets.
  • 创建一个管道用于查找要复制的表,创建另一个管道用于执行实际复制操作。Create a pipeline to look up the tables to be copied and another pipeline to perform the actual copy operation.
  • 启动管道运行。Start a pipeline run.
  • 监视管道和活动运行。Monitor the pipeline and activity runs.

转到以下教程,了解如何以增量方式将数据从源复制到目标:Advance to the following tutorial to learn about copy data incrementally from a source to a destination: