使用 Azure 数据工厂将数据加载到 Azure Synapse Analytics 中Load data into Azure Synapse Analytics by using Azure Data Factory

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

Azure Synapse Analytics(以前称为 SQL DW)是一种基于云的向外扩展数据库,可以处理大量数据(关系数据和非关系数据)。Azure Synapse Analytics (formerly SQL DW) is a cloud-based, scale-out database that's capable of processing massive volumes of data, both relational and non-relational. Azure Synapse Analytics 在大规模并行处理 (MPP) 体系结构的基础上构建,已针对企业数据仓库工作负荷进行优化。Azure Synapse Analytics is built on the massively parallel processing (MPP) architecture that's optimized for enterprise data warehouse workloads. 它通过灵活地缩放存储以及独立计算提供云灵活性。It offers cloud elasticity with the flexibility to scale storage and compute independently.

现在通过使用 Azure 数据工厂,Azure Synapse Analytics 入门变得前所未有的简单。Getting started with Azure Synapse Analytics is now easier than ever when you use Azure Data Factory. Azure 数据工厂是一个完全托管的基于云的数据集成服务。Azure Data Factory is a fully managed cloud-based data integration service. 该服务可用于使用现有系统中的数据填充 Azure Synapse Analytics,在生成分析解决方案时节省时间。You can use the service to populate an Azure Synapse Analytics with data from your existing system and save time when building your analytics solutions.

以下是使用 Azure 数据工厂将数据加载到 Azure Synapse Analytics 的优点:Azure Data Factory offers the following benefits for loading data into Azure Synapse Analytics:

  • 轻松设置:无需脚本的直观 5 步向导。Easy to set up: An intuitive 5-step wizard with no scripting required.
  • 丰富的数据存储支持:对一组丰富的本地和基于云的数据存储的内置支持。Rich data store support: Built-in support for a rich set of on-premises and cloud-based data stores. 有关详细列表,请参阅表支持的数据存储For a detailed list, see the table of Supported data stores.
  • 安全且合规:通过 HTTPS 或 ExpressRoute 传输数据。Secure and compliant: Data is transferred over HTTPS or ExpressRoute. 存在全局服务可确保数据永远不会离开地理边界。The global service presence ensures that your data never leaves the geographical boundary.
  • 通过使用 PolyBase 提供无与伦比的性能:使用 Polybase 是将数据移到 Azure Synapse Analytics 的最高效方法。Unparalleled performance by using PolyBase: Polybase is the most efficient way to move data into Azure Synapse Analytics. 使用暂存 blob 功能,可以从所有类型的数据存储(包括 Azure Blob 存储)实现高加载速度。Use the staging blob feature to achieve high load speeds from all types of data stores, including Azure Blob storage. (默认情况下,Polybase 支持 Azure Blob 存储。)有关详细信息,请参阅复制活动性能(Polybase supports Azure Blob storage by default.) For details, see Copy activity performance.

本文介绍如何使用数据工厂复制数据工具将数据从 Azure SQL 数据库加载至 Azure Synapse Analytics。This article shows you how to use the Data Factory Copy Data tool to load data from Azure SQL Database into Azure Synapse Analytics. 可以遵循类似步骤,从其他类型的数据存储中复制数据。You can follow similar steps to copy data from other types of data stores.

先决条件Prerequisites

  • Azure 订阅:如果没有 Azure 订阅,可在开始前创建一个 1 元人民币试用帐户Azure subscription: If you don't have an Azure subscription, create a 1rmb trial account before you begin.
  • Azure Synapse Analytics:此数据仓库保存从 SQL 数据库复制的数据。Azure Synapse Analytics: The data warehouse holds the data that's copied over from the SQL database. 如果没有 Azure Synapse Analytics,请参阅创建 Azure Synapse Analytics 中的说明。If you don't have an Azure Synapse Analytics, see the instructions in Create an Azure Synapse Analytics.
  • Azure SQL 数据库:本教程从 Azure SQL 数据库中的 Adventure Works LT 示例数据集复制数据。Azure SQL Database: This tutorial copies data from the Adventure Works LT sample dataset in Azure SQL Database . 可以按照在 Azure SQL 数据库中创建示例数据库中的说明在 SQL 数据库中创建此示例数据库。You can create this sample database in SQL Database by following the instructions in Create a sample database in Azure SQL Database.
  • Azure 存储帐户:Azure 存储用作大容量复制操作中的暂存 blob。Azure storage account: Azure Storage is used as the staging blob in the bulk copy operation. 如果没有 Azure 存储帐户,请参阅创建存储帐户中的说明。If you don't have an Azure storage account, see the instructions in Create a storage account.

创建数据工厂Create a data factory

  1. 在左侧菜单中,选择“创建资源” > “数据 + 分析” > “数据工厂”:On the left menu, select Create a resource > Data + Analytics > Data Factory:

  2. 在“新建数据工厂”页上,为以下项提供值:On the New data factory page, provide values for following items:

    • 名称:输入“LoadSQLDWDemo”作为名称。Name: Enter LoadSQLDWDemo for name. 数据工厂的名称必须全局独一无二。The name for your data factory must be *globally unique. 如果收到错误“数据工厂名称‘LoadSQLDWDemo’不可用”,请为数据工厂输入其他名称。If you receive the error "Data factory name 'LoadSQLDWDemo' is not available", enter a different name for the data factory. 例如,可以使用名称 yournameADFTutorialDataFactoryFor example, you could use the name yournameADFTutorialDataFactory. 请重试创建数据工厂。Try creating the data factory again. 有关数据工厂项目的命名规则,请参阅数据工厂命名规则For the naming rules for Data Factory artifacts, see Data Factory naming rules.
    • 订阅:选择要在其中创建数据工厂的 Azure 订阅。Subscription: Select your Azure subscription in which to create the data factory.
    • 资源组:从下拉列表中选择现有资源组,或选择“新建”选项并输入资源组的名称。Resource Group: Select an existing resource group from the drop-down list, or select the Create new option and enter the name of a resource group. 若要了解有关资源组的详细信息,请参阅 使用资源组管理 Azure 资源To learn about resource groups, see Using resource groups to manage your Azure resources.
    • 版本:选择“V2”。Version: Select V2.
    • 位置:选择数据工厂的位置。Location: Select the location for the data factory. 下拉列表中仅显示支持的位置。Only supported locations are displayed in the drop-down list. 数据工厂使用的数据存储可以在其他位置和区域中。The data stores that are used by data factory can be in other locations and regions. 这些数据存储包括 Azure 存储、Azure SQL 数据库,等等。These data stores include Azure Storage, Azure SQL Database, and so on.
  3. 选择“创建” 。Select Create.

  4. 创建操作完成后,请转到数据工厂。After creation is complete, go to your data factory. 此时会看到“数据工厂”主页,如下图所示:You see the Data Factory home page as shown in the following image:

    数据工厂主页

    选择“创作和监视”磁贴,在单独的选项卡中启动数据集成应用程序。Select the Author & Monitor tile to launch the Data Integration Application in a separate tab.

将数据加载到 Azure Synapse Analytics 中Load data into Azure Synapse Analytics

  1. 在“入门”页中,选择“复制数据”磁贴以启动“复制数据”工具 。In the Get started page, select the Copy Data tile to launch the Copy Data tool.

  2. 在“属性”页中,为“任务名称”字段指定“CopyFromSQLToSQLDW”,然后选择“下一步” 。In the Properties page, specify CopyFromSQLToSQLDW for the Task name field, and select Next.

    “属性”页

  3. 在“源数据存储”页上,完成以下步骤:In the Source data store page, complete the following steps:

    提示

    本教程使用“SQL 身份验证”作为源数据存储的身份验证类型,但你可以根据需要选择其他受支持的身份验证方法:“服务主体”和“托管标识”。In this tutorial, you use SQL authentication as the authentication type for your source data store, but you can choose other supported authentication methods:Service Principal and Managed Identity if needed. 有关详细信息,请参阅此文中的相应部分。Refer to corresponding sections in this article for details. 为了安全地存储数据存储的机密,我们还建议使用 Azure Key Vault。To store secrets for data stores securely, it's also recommended to use an Azure Key Vault. 有关详细说明,请参阅此文Refer to this article for detailed illustrations.

    a.a. 单击“+ 创建新连接”。click + Create new connection.

    b.b. 从库中选择“Azure SQL 数据库”,然后选择“继续” 。Select Azure SQL Database from the gallery, and select Continue. 可以在搜索框中键入“SQL”以筛选连接器。You can type "SQL" in the search box to filter the connectors.

    选择 Azure SQL DB

    c.c. 在“新建链接服务”页上,从下拉列表中选择服务器名称和 DB 名称,指定用户名和密码。In the New Linked Service page, select your server name and DB name from the dropdown list, and specify the username and password. 单击“测试连接”以验证设置,然后选择“创建” 。Click Test connection to validate the settings, then select Create.

    配置 Azure SQL DB

    d.d. 选择新创建的链接服务作为源,然后单击“下一步”。Select the newly created linked service as source, then click Next.

  4. 在“选择要从中复制数据的表或使用自定义查询”页中,输入 SalesLT 以筛选表。In the Select tables from which to copy the data or use a custom query page, enter SalesLT to filter the tables. 选中“(全选)”复选框以便对副本使用所有表,然后选择“下一步”。Choose the (Select all) box to use all of the tables for the copy, and then select Next.

    选择源表

  5. 在“应用筛选器”页中,指定你的设置或选择“下一步”。In the Apply filter page, specify your settings or select Next.

  6. 在“目标数据存储”页上,完成以下步骤:In the Destination data store page, complete the following steps:

    提示

    本教程使用“SQL 身份验证”作为目标数据存储的身份验证类型,但你可以根据需要选择其他受支持的身份验证方法:“服务主体”和“托管标识”。In this tutorial, you use SQL authentication as the authentication type for your destination data store, but you can choose other supported authentication methods:Service Principal and Managed Identity if needed. 有关详细信息,请参阅此文中的相应部分。Refer to corresponding sections in this article for details. 为了安全地存储数据存储的机密,我们还建议使用 Azure Key Vault。To store secrets for data stores securely, it's also recommended to use an Azure Key Vault. 有关详细说明,请参阅此文Refer to this article for detailed illustrations.

    a.a. 单击“+ 创建新连接”来添加连接Click + Create new connection to add a connection

    b.b. 从库中选择“Azure Synapse Analytics (以前称为 SQL DW)”,然后选择“继续”。Select Azure Synapse Analytics (formerly SQL DW) from the gallery, and select Continue. 可以在搜索框中键入“SQL”以筛选连接器。You can type "SQL" in the search box to filter the connectors.

    选择 Azure SQL DW

    c.c. 在“新建链接服务”页上,从下拉列表中选择服务器名称和 DB 名称,指定用户名和密码。In the New Linked Service page, select your server name and DB name from the dropdown list, and specify the username and password. 单击“测试连接”以验证设置,然后选择“创建” 。Click Test connection to validate the settings, then select Create.

    配置 Azure SQL DW

    d.d. 选择新创建的链接服务作为接收器,然后单击“下一步”。Select the newly created linked service as sink, then click Next.

  7. 在“表映射”页中查看内容并选择“下一步”。In the Table mapping page, review the content, and select Next. 此时会显示智能表映射。An intelligent table mapping displays. 源表已根据表名映射到目标表。The source tables are mapped to the destination tables based on the table names. 如果目标中不存在表,则默认情况下 Azure 数据工厂将创建一个具有相同名称的目标表。If a source table doesn't exist in the destination, Azure Data Factory creates a destination table with the same name by default. 还可以将源表映射到现有目标表。You can also map a source table to an existing destination table.

    备注

    当 SQL Server 或 Azure SQL 数据库为源时,将应用 Azure Synapse Analytics 接收器的自动创建表的功能。Automatic table creation for the Azure Synapse Analytics sink applies when SQL Server or Azure SQL Database is the source. 若从其他源数据存储复制数据,则在执行数据复制操作前,需先在接收器 Azure Synapse Analytics 中预创建架构。If you copy data from another source data store, you need to pre-create the schema in the sink Azure Synapse Analytics before executing the data copy.

    “表映射”页

  8. 在“列映射”页中,查看内容并选择“下一步”。In the Column mapping page, review the content, and select Next. 智能表映射基于列名。The intelligent table mapping is based on the column name. 如果自动创建的表的数据工厂,源和目标存储之间存在不兼容时,可能发生数据类型转换。If you let Data Factory automatically create the tables, data type conversion can occur when there are incompatibilities between the source and destination stores. 如果在源列与目标列之间进行不受支持的数据类型转换,会显示错误消息以及相应的表。If there's an unsupported data type conversion between the source and destination column, you see an error message next to the corresponding table.

    “列映射”页

  9. 在“设置”页上,完成以下步骤:In the Settings page, complete the following steps:

    a.a. 在“暂存设置”部分,单击“+ 新建”,新建临时存储 。In Staging settings section, click + New to new a staging storage. 该存储用于在通过 PolyBase 将数据加载至 Azure Synapse Analytics 前暂存数据。The storage is used for staging the data before it loads into Azure Synapse Analytics by using PolyBase. 复制完成后,会自动清除 Azure Blob 存储中的临时数据。After the copy is complete, the interim data in Azure Blob Storage is automatically cleaned up.

    b.b. 在“新建链接服务”页上,选择你的存储帐户,然后选择“创建”以部署链接服务 。In the New Linked Service page, select your storage account, and select Create to deploy the linked service.

    c.c. 在“高级设置”部分,取消选择“使用默认类型”选项,然后选择“下一步” 。In the Advanced settings section, deselect the Use type default option, then select Next.

    配置 PolyBase

  10. 在“摘要”页中检查设置,然后选择“下一步”。In the Summary page, review the settings, and select Next.

    “摘要”页

  11. 在“部署”页中,选择“监视”可以监视管道(任务) 。On the Deployment page, select Monitor to monitor the pipeline (task).

  12. 请注意,界面中已自动选择左侧的“监视”选项卡。Notice that the Monitor tab on the left is automatically selected. 管道运行成功完成后,在“管道名称”列下选择“CopyFromSQLToSQLDW”链接即可查看活动运行详细信息或重新运行该管道。When the pipeline run completes successfully, select the CopyFromSQLToSQLDW link under the PIPELINE NAME column to view activity run details or to rerun the pipeline.

    监视管道运行Monitor pipeline runs

  13. 若要切换回到管道运行视图,请选择顶部的“所有管道运行”链接。To switch back to the pipeline runs view, select the All pipeline runs link at the top. 选择“刷新”可刷新列表。Select Refresh to refresh the list.

    监视活动运行

  14. 若要监视每个复制活动的执行详情,请在活动运行视图中选择“活动名称”下的“详细信息”链接(眼镜图标) 。To monitor the execution details for each copy activity, select the Details link (eyeglasses icon) under ACTIVITY NAME in the activity runs view. 可以监视详细信息,例如,从源复制到接收器的数据量、吞吐量、执行步骤以及相应的持续时间和使用的配置。You can monitor details like the volume of data copied from the source to the sink, data throughput, execution steps with corresponding duration, and used configurations. 监视活动运行详细信息Monitor activity run details

    监视活动运行详细信息

后续步骤Next steps

请转至下列文章,了解 Azure Synapse Analytics 支持:Advance to the following article to learn about Azure Synapse Analytics support: