使用“复制数据”工具,将数据从 Azure Blob 存储复制到 SQL 数据库Copy data from Azure Blob storage to a SQL Database by using the Copy Data tool

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

在本教程中,我们将使用 Azure 门户创建数据工厂。In this tutorial, you use the Azure portal to create a data factory. 然后,使用“复制数据”工具创建一个管道,以便将数据从 Azure Blob 存储复制到 SQL 数据库。Then you use the Copy Data tool to create a pipeline that copies data from Azure Blob storage to a SQL Database.

备注

如果对 Azure 数据工厂不熟悉,请参阅 Azure 数据工厂简介If you're new to Azure Data Factory, see Introduction to Azure Data Factory.

将在本教程中执行以下步骤:In this tutorial, you perform the following steps:

  • 创建数据工厂。Create a data factory.
  • 使用“复制数据”工具创建管道。Use the Copy Data tool to create a pipeline.
  • 监视管道和活动运行。Monitor the pipeline and activity runs.

先决条件Prerequisites

  • Azure 订阅:如果没有 Azure 订阅,可在开始前创建一个 1 元人民币试用帐户Azure subscription: If you don't have an Azure subscription, create a 1rmb trial account before you begin.
  • Azure 存储帐户,使用 Blob 存储作为_源_数据存储。Azure Storage account: Use Blob storage as the source data store. 如果没有 Azure 存储帐户,请参阅创建存储帐户中的说明。If you don't have an Azure Storage account, see the instructions in Create a storage account.
  • Azure SQL 数据库:使用 SQL 数据库作为_接收器_数据存储。Azure SQL Database: Use a SQL Database as the sink data store. 如果没有 SQL 数据库,请参阅创建 SQL 数据库中的说明。If you don't have a SQL Database, see the instructions in Create a SQL Database.

创建 blob 和 SQL 表Create a blob and a SQL table

执行以下步骤,准备本教程所需的 Blob 存储和 SQL 数据库。Prepare your Blob storage and your SQL Database for the tutorial by performing these steps.

创建源 blobCreate a source blob

  1. 启动记事本Launch Notepad. 复制以下文本,并在磁盘上将其保存在名为 inputEmp.txt 的文件中:Copy the following text and save it in a file named inputEmp.txt on your disk:

    FirstName|LastName
    John|Doe
    Jane|Doe
    
  2. 创建名为 adfv2tutorial 的容器,然后将 inputEmp.txt 文件上传到该容器中。Create a container named adfv2tutorial and upload the inputEmp.txt file to the container. 可以使用 Azure 门户或各种工具(如 Azure 存储资源管理器)来执行这些任务。You can use the Azure portal or various tools like Azure Storage Explorer to perform these tasks.

创建接收器 SQL 表Create a sink SQL table

  1. 使用以下 SQL 脚本在 SQL 数据库中创建名为 dbo.emp 的表:Use the following SQL script to create a table named dbo.emp in your SQL Database:

    CREATE TABLE dbo.emp
    (
        ID int IDENTITY(1,1) NOT NULL,
        FirstName varchar(50),
        LastName varchar(50)
    )
    GO
    
    CREATE CLUSTERED INDEX IX_emp_ID ON dbo.emp (ID);
    
  2. 允许 Azure 服务访问 SQL Server。Allow Azure services to access SQL Server. 验证是否为运行 SQL 数据库的服务器启用了“允许 Azure 服务和资源访问此服务器”。Verify that the setting Allow Azure services and resources to access this server is enabled for your server that's running SQL Database. 通过此设置,数据工厂可将数据写入数据库实例。This setting lets Data Factory write data to your database instance. 若要验证并启用此设置,请转到逻辑 SQL 服务器 >“安全性”>“防火墙和虚拟网络”> 将“允许 Azure 服务和资源访问此服务器”选项设置为“打开”。 To verify and turn on this setting, go to logical SQL server > Security > Firewalls and virtual networks > set the Allow Azure services and resources to access this server option to ON.

创建数据工厂Create a data factory

  1. 在左侧菜单中,选择“创建资源” > “数据 + 分析” > “数据工厂”:On the left menu, select Create a resource > Data + Analytics > Data Factory:

    新建数据工厂

  2. 在“新建数据工厂”页的“名称”下输入 ADFTutorialDataFactoryOn the New data factory page, under Name, enter ADFTutorialDataFactory.

    数据工厂的名称必须全局唯一。The name for your data factory must be globally unique. 可能会收到以下错误消息:You might receive the following error message:

    新的数据工厂错误消息

    如果收到有关名称值的错误消息,请为数据工厂输入另一名称。If you receive an error message about the name value, enter a different name for the data factory. 例如,使用名称 yournameADFTutorialDataFactoryFor example, use the name yournameADFTutorialDataFactory. 有关数据工厂项目的命名规则,请参阅数据工厂命名规则For the naming rules for Data Factory artifacts, see Data Factory naming rules.

  3. 选择要在其中创建新数据工厂的 Azure 订阅Select the Azure subscription in which to create the new data factory.

  4. 对于“资源组”,请执行以下步骤之一:For Resource Group, take one of the following steps:

    a.a. 选择“使用现有资源组”,并从下拉列表选择现有的资源组。Select Use existing, and select an existing resource group from the drop-down list.

    b.b. 选择“新建”,并输入资源组的名称。Select Create new, and enter the name of a resource group.

    若要了解资源组,请参阅使用资源组管理 Azure 资源To learn about resource groups, see Use resource groups to manage your Azure resources.

  5. 在“版本”下选择“V2”作为版本。 Under version, select V2 for the version.

  6. 在“位置”下选择数据工厂的位置。Under location, select the location for the data factory. 下拉列表中仅显示支持的位置。Only supported locations are displayed in the drop-down list. 数据工厂使用的数据存储(例如,Azure 存储和 SQL 数据库)和计算资源(例如,Azure HDInsight)可以位于其他位置和区域。The data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) that are used by your data factory can be in other locations and regions.

  7. 选择“创建” 。Select Create.

  8. 创建完以后,会显示“数据工厂”主页。After creation is finished, the Data Factory home page is displayed.

    数据工厂主页

  9. 若要在单独的选项卡中启动 Azure 数据工厂用户界面 (UI),请选择“创作和监视”磁贴。To launch the Azure Data Factory user interface (UI) in a separate tab, select the Author & Monitor tile.

使用“复制数据”工具创建管道Use the Copy Data tool to create a pipeline

  1. 在“开始使用”页中选择“复制数据”磁贴,启动“复制数据”工具。 On the Let's get started page, select the Copy Data tile to launch the Copy Data tool.

    “复制数据”工具磁贴

  2. 在“属性”页的“任务名称”下,输入 CopyFromBlobToSqlPipelineOn the Properties page, under Task name, enter CopyFromBlobToSqlPipeline. 然后,选择“下一步”。Then select Next. 数据工厂 UI 将使用指定的任务名称创建一个管道。The Data Factory UI creates a pipeline with the specified task name. 创建管道Create a pipeline

  3. 在“源数据存储”页上,完成以下步骤:On the Source data store page, complete the following steps:

    a.a. 单击“+ 创建新连接”来添加连接Click + Create new connection to add a connection

    b.b. 从库中选择“Azure Blob 存储” ,然后选择“继续”。Select Azure Blob Storage from the gallery, and then select Continue.

    c.c. 在“新建链接服务”页面上,选择你的 Azure 订阅,从“存储帐户名称”列表中选择你的存储帐户。 On the New Linked Service page, select your Azure subscription, and select your storage account from the Storage account name list. 测试连接,然后选择“创建”。Test connection and then select Create.

    d.d. 选择新创建的链接服务作为源,然后单击“下一步”。Select the newly created linked service as source, then click Next.

    选择源链接服务

  4. 在“选择输入文件或文件夹”页中完成以下步骤:On the Choose the input file or folder page, complete the following steps:

    a.a. 单击“浏览”导航到 adfv2tutorial/input 文件夹,选择 inputEmp.txt 文件,然后单击“选择”。Click Browse to navigate to the adfv2tutorial/input folder, select the inputEmp.txt file, then click Choose.

    b.b. 单击“下一步”转到下一步骤。Click Next to move to next step.

  5. 在“文件格式设置”页面上,选中“第一行作为标题”复选框。On the File format settings page, enable the checkbox for First row as header. 注意,该工具会自动检测列分隔符与行分隔符。Notice that the tool automatically detects the column and row delimiters. 选择“下一步”。Select Next. 还可以在此页中预览数据,以及查看输入数据的架构。You can also preview data and view the schema of the input data on this page.

    文件格式设置

  6. 在“目标数据存储”页上,完成以下步骤:On the Destination data store page, completes the following steps:

    a.a. 单击“+ 创建新连接”来添加连接Click + Create new connection to add a connection

    b.b. 从库中选择“Azure SQL 数据库”,然后选择“继续” 。Select Azure SQL Database from the gallery, and then select Continue.

    c.c. 在“新建链接服务”页面上,从下拉列表中选择你的服务器名称和 DB 名称,指定用户名和密码,然后选择“创建”。On the New Linked Service page, select your server name and DB name from the dropdown list, and specify the username and password, then select Create.

    配置 Azure SQL DB

    d.d. 选择新创建的链接服务作为接收器,然后单击“下一步”。Select the newly created linked service as sink, then click Next.

  7. 在“表映射”页中,选择 [dbo].[emp] 表,然后选择“下一步”。 On the Table mapping page, select the [dbo].[emp] table, and then select Next.

  8. 在“列映射”页中,请注意,输入文件中的第二个和第三个列已映射到 emp 表的 FirstNameLastName 列。On the Column mapping page, notice that the second and the third columns in the input file are mapped to the FirstName and LastName columns of the emp table. 请调整映射,确保没有错误,然后选择“下一步”。Adjust the mapping to make sure that there is no error, and then select Next.

    “列映射”页

  9. 在“设置”页中,选择“下一步”。 On the Settings page, select Next.

  10. 在“摘要”页中检查设置,然后选择“下一步”。On the Summary page, review the settings, and then select Next.

  11. 在“部署”页中,选择“监视”可以监视管道(任务) 。On the Deployment page, select Monitor to monitor the pipeline (task).

    监视管道

  12. 在“管道运行”页上,选择“刷新”来刷新列表。On the Pipeline runs page, select Refresh to refresh the list. 单击“管道名称”下的链接,查看活动运行详细信息或重新运行管道。Click the link under PIPELINE NAME to view activity run details or rerun the pipeline. 管道运行Pipeline run

  13. 在“活动运行”页上,选择“活动名称”列下的“详细信息”链接(眼镜图标),以获取有关复制操作的更多详细信息。On the Activity runs page, select the Details link (eyeglasses icon) under the ACTIVITY NAME column for more details about copy operation. 若要回到“管道运行”视图,请选择痕迹导航菜单中的“所有管道运行”链接。To go back to the Pipeline Runs view, select the ALL pipeline runs link in the breadcrumb menu. 若要刷新视图,请选择“刷新”。To refresh the view, select Refresh.

    监视活动运行

  14. 验证数据是否已插入到 SQL 数据库中的 dbo.emp 表。Verify that the data is inserted into the dbo.emp table in your SQL Database.

  15. 选择左侧的“创作”选项卡切换到编辑器模式。Select the Author tab on the left to switch to the editor mode. 可以使用编辑器来更新通过该工具创建的链接服务、数据集和管道。You can update the linked services, datasets, and pipelines that were created via the tool by using the editor. 有关在数据工厂 UI 中编辑这些实体的详细信息,请参阅本教程的 Azure 门户版本For details on editing these entities in the Data Factory UI, see the Azure portal version of this tutorial.

    选择“创作”选项卡

后续步骤Next steps

此示例中的管道将数据从 Blob 存储复制到 SQL 数据库。The pipeline in this sample copies data from Blob storage to a SQL Database. 你已了解如何:You learned how to:

  • 创建数据工厂。Create a data factory.
  • 使用“复制数据”工具创建管道。Use the Copy Data tool to create a pipeline.
  • 监视管道和活动运行。Monitor the pipeline and activity runs.

若要了解如何将数据从本地复制到云,请转到以下教程:Advance to the following tutorial to learn how to copy data from on-premises to the cloud: