将数据从 SQL Server 数据库复制到 Azure Blob 存储Copy data from a SQL Server database to Azure Blob storage

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

本教程使用 Azure 数据工厂用户界面 (UI) 创建一个数据工厂管道,用于将数据从 SQL Server 数据库复制到 Azure Blob 存储。In this tutorial, you use the Azure Data Factory user interface (UI) to create a data factory pipeline that copies data from a SQL Server database to Azure Blob storage. 同时创建一个自承载 Integration Runtime,用其在本地数据存储和云数据存储之间移动数据。You create and use a self-hosted integration runtime, which moves data between on-premises and cloud data stores.

备注

本文不提供数据工厂的详细介绍。This article doesn't provide a detailed introduction to Data Factory. 有关详细信息,请参阅数据工厂简介For more information, see Introduction to Data Factory.

将在本教程中执行以下步骤:In this tutorial, you perform the following steps:

  • 创建数据工厂。Create a data factory.
  • 创建自我托管的集成运行时。Create a self-hosted integration runtime.
  • 创建 SQL Server 和 Azure 存储链接服务。Create SQL Server and Azure Storage linked services.
  • 创建 SQL Server 和 Azure Blob 数据集。Create SQL Server and Azure Blob datasets.
  • 创建包含复制活动的管道,用于移动数据。Create a pipeline with a copy activity to move the data.
  • 启动管道运行。Start a pipeline run.
  • 监视管道运行。Monitor the pipeline run.

先决条件Prerequisites

Azure 订阅Azure subscription

如果还没有 Azure 订阅,请在开始前创建一个 1 元试用帐户Before you begin, if you don't already have an Azure subscription, create a 1rmb trial account.

Azure 角色Azure roles

若要创建数据工厂实例,用于登录到 Azure 的用户帐户必须分配有“参与者”或“所有者”角色,或者必须是 Azure 订阅的管理员。To create data factory instances, the user account you use to sign in to Azure must be assigned a Contributor or Owner role or must be an administrator of the Azure subscription.

若要查看自己在订阅中的权限,请转到 Azure 门户。To view the permissions you have in the subscription, go to the Azure portal. 在右上角选择自己的用户名,然后选择“权限”。In the upper-right corner, select your user name, and then select Permissions. 如果可以访问多个订阅,请选择相应的订阅。If you have access to multiple subscriptions, select the appropriate subscription. 有关如何将用户添加到角色的示例说明,请参阅使用 RBAC 和 Azure 门户管理访问权限For sample instructions on how to add a user to a role, see Manage access using RBAC and the Azure portal.

SQL Server 2014、2016 和 2017SQL Server 2014, 2016, and 2017

在本教程中,需将 SQL Server 数据库用作源数据存储。In this tutorial, you use a SQL Server database as a source data store. 在本教程中创建的数据工厂中的管道将数据从这个 SQL Server 数据库(源)复制到 Blob 存储(接收器)。The pipeline in the data factory you create in this tutorial copies data from this SQL Server database (source) to Blob storage (sink). 然后,你可以在 SQL Server 数据库中创建名为 emp 的表,并向表中插入几个示例条目。You then create a table named emp in your SQL Server database and insert a couple of sample entries into the table.

  1. 启动 SQL Server Management Studio。Start SQL Server Management Studio. 如果此软件尚未安装在计算机上,请访问下载 SQL Server Management StudioIf it's not already installed on your machine, go to Download SQL Server Management Studio.

  2. 使用凭据连接到 SQL Server 实例。Connect to your SQL Server instance by using your credentials.

  3. 创建示例数据库。Create a sample database. 在树状视图中右键单击“数据库”,然后选择“新建数据库”。 In the tree view, right-click Databases, and then select New Database.

  4. 在“新建数据库”窗口中输入数据库的名称,然后选择“确定”。In the New Database window, enter a name for the database, and then select OK.

  5. 若要创建 emp 表并将一些示例数据插入到其中,请对数据库运行以下查询脚本。To create the emp table and insert some sample data into it, run the following query script against the database. 在树状视图中右键单击所创建的数据库,然后选择“新建查询”。In the tree view, right-click the database that you created, and then select New Query.

     CREATE TABLE dbo.emp
     (
         ID int IDENTITY(1,1) NOT NULL,
         FirstName varchar(50),
         LastName varchar(50)
     )
     GO
    
     INSERT INTO emp (FirstName, LastName) VALUES ('John', 'Doe')
     INSERT INTO emp (FirstName, LastName) VALUES ('Jane', 'Doe')
     GO
    

Azure 存储帐户Azure storage account

在本教程中,请使用常规用途的 Azure 存储帐户(具体说来就是 Blob 存储)作为目标/接收器数据存储。In this tutorial, you use a general-purpose Azure storage account (specifically, Blob storage) as a destination/sink data store. 如果没有常规用途的 Azure 存储帐户,请参阅创建存储帐户If you don't have a general-purpose Azure storage account, see Create a storage account. 在本教程中创建的数据工厂中的管道将数据从 SQL Server 数据库(源)复制到 Blob 存储(接收器)。The pipeline in the data factory that you create in this tutorial copies data from the SQL Server database (source) to Blob storage (sink).

获取存储帐户名称和帐户密钥Get the storage account name and account key

在本教程中,请使用存储帐户的名称和密钥。You use the name and key of your storage account in this tutorial. 若要获取存储帐户的名称和密钥,请执行以下步骤:To get the name and key of your storage account, take the following steps:

  1. 使用 Azure 用户名和密码登录到 Azure 门户Sign in to the Azure portal with your Azure user name and password.

  2. 在左窗格中,选择“所有服务”。In the left pane, select All services. 使用“存储”关键字进行筛选,然后选择“存储帐户”。 Filter by using the Storage keyword, and then select Storage accounts.

    存储帐户搜索

  3. 在存储帐户列表中,根据需要通过筛选找出你的存储帐户,In the list of storage accounts, filter for your storage account if needed. 然后选择存储帐户。Then select your storage account.

  4. 在“存储帐户”窗口中选择“访问密钥” 。In the Storage account window, select Access keys.

  5. 复制“存储帐户名称”和“key1”框中的值,然后将其粘贴到记事本或其他编辑器中,在教程的后面部分使用。 In the Storage account name and key1 boxes, copy the values, and then paste them into Notepad or another editor for later use in the tutorial.

创建 adftutorial 容器Create the adftutorial container

此部分在 Blob 存储中创建名为 adftutorial 的 Blob 容器。In this section, you create a blob container named adftutorial in your Blob storage.

  1. 在“存储帐户”窗口中,转到“概览”,然后选择“容器”。In the Storage account window, go to Overview, and then select Containers.

    选择“Blob”选项

  2. 在“容器”窗口中,选择“+ 容器”来新建容器。In the Containers window, select + Container to create a new one.

  3. 在“新建容器”窗口的“名称”下,输入 adftutorialIn the New container window, under Name, enter adftutorial. 然后选择“创建”。Then select Create.

  4. 在容器列表中,选择你刚才创建的 adftutorialIn the list of containers, select adftutorial you just created.

  5. adftutorial 的“容器”窗口保持打开状态。Keep the container window for adftutorial open. 在教程结束时使用它来验证输出。You use it to verify the output at the end of the tutorial. 数据工厂自动在该容器中创建输出文件夹,因此不需要你来创建。Data Factory automatically creates the output folder in this container, so you don't need to create one.

创建数据工厂Create a data factory

在此步骤中,请先创建数据工厂,然后启动数据工厂 UI,在该数据工厂中创建一个管道。In this step, you create a data factory and start the Data Factory UI to create a pipeline in the data factory.

  1. 打开 Microsoft EdgeGoogle Chrome Web 浏览器。Open the Microsoft Edge or Google Chrome web browser. 目前,仅 Microsoft Edge 和 Google Chrome Web 浏览器支持数据工厂 UI。Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.

  2. 在左侧菜单中,选择“创建资源” > “数据 + 分析” > “数据工厂”:On the left menu, select Create a resource > Data + Analytics > Data Factory:

    在“新建”窗格中选择“数据工厂”

  3. 在“新建数据工厂”页的“名称”下输入 ADFTutorialDataFactoryOn the New data factory page, under Name, enter ADFTutorialDataFactory.

    数据工厂的名称必须全局唯一。The name of the data factory must be globally unique. 如果看到名称字段的以下错误消息,请更改数据工厂的名称(例如,改为 yournameADFTutorialDataFactory)。If you see the following error message for the name field, change the name of the data factory (for example, yournameADFTutorialDataFactory). 有关数据工厂项目的命名规则,请参阅数据工厂命名规则For naming rules for Data Factory artifacts, see Data Factory naming rules.

    新建数据工厂名称

  4. 选择要在其中创建数据工厂的 Azure 订阅Select the Azure subscription in which you want to create the data factory.

  5. 对于“资源组”,请执行以下步骤之一:For Resource Group, take one of the following steps:

    • 选择“使用现有资源组”,并从下拉列表选择现有的资源组。Select Use existing, and select an existing resource group from the drop-down list.

    • 选择“新建”,并输入资源组的名称。Select Create new, and enter the name of a resource group.

      若要了解资源组,请参阅使用资源组管理 Azure 资源To learn about resource groups, see Use resource groups to manage your Azure resources.

  6. 在“版本”下选择“V2”。 Under Version, select V2.

  7. 在“位置”下选择数据工厂的位置。Under Location, select the location for the data factory. 下拉列表中仅显示支持的位置。Only locations that are supported are displayed in the drop-down list. 数据工厂使用的数据存储(例如,Azure 存储和 SQL 数据库)和计算资源(例如,Azure HDInsight)可以位于其他区域。The data stores (for example, Storage and SQL Database) and computes (for example, Azure HDInsight) used by Data Factory can be in other regions.

  8. 选择“创建”。Select Create.

  9. 创建完以后,会看到图中所示的“数据工厂”页:After the creation is finished, you see the Data Factory page as shown in the image:

    数据工厂主页

  10. 选择“创作和监视”磁贴,在单独的选项卡中启动数据工厂 UI。Select the Author & Monitor tile to launch the Data Factory UI in a separate tab.

创建管道Create a pipeline

  1. 在“开始使用”页中,选择“创建管道”。 On the Let's get started page, select Create pipeline. 系统会自动创建一个管道。A pipeline is automatically created for you. 可以在树状视图中看到该管道,并且其编辑器已打开。You see the pipeline in the tree view, and its editor opens.

    “开始使用”页

  2. 在“常规”面板的“属性”中,将“名称”指定为 SQLServerToBlobPipeline 。In the General panel under Properties, specify SQLServerToBlobPipeline for Name. 然后通过单击右上角的“属性”图标来折叠面板。Then collapse the panel by clicking the Properties icon in the top-right corner.

  3. 在“活动”工具箱中,展开“移动和转换”。In the Activities tool box, expand Move & Transform. 将“复制”活动拖放到管道设计图面。Drag and drop the Copy activity to the pipeline design surface. 将活动的名称设置为 CopySqlServerToAzureBlobActivitySet the name of the activity to CopySqlServerToAzureBlobActivity.

  4. 在“属性”窗口中转到“源”选项卡,然后选择“+ 新建”。 In the Properties window, go to the Source tab, and select + New.

  5. 在“新建数据集”对话框中搜索“SQL Server”。 In the New Dataset dialog box, search for SQL Server. 选择“SQL Server”,然后选择“继续”。Select SQL Server, and then select Continue. 新建 SqlServer 数据集New SqlServer dataset

  6. 在“设置属性”对话框中的“名称”下,输入 SqlServerDatasetIn the Set Properties dialog box, under Name, enter SqlServerDataset. 在“链接服务”下,选择“+新建”。Under Linked service, select + New. 执行此步骤来与源数据存储(SQL Server 数据库)建立连接。You create a connection to the source data store (SQL Server database) in this step.

  7. 在“新建链接服务”对话框中,添加 SqlServerLinkedService 作为名称In the New Linked Service dialog box, add Name as SqlServerLinkedService. 在“通过集成运行时连接”下选择“+新建”。 Under Connect via integration runtime, select +New. 在本部分,请创建一个自承载 Integration Runtime,然后将其与安装了 SQL Server 数据库的本地计算机相关联。In this section, you create a self-hosted integration runtime and associate it with an on-premises machine with the SQL Server database. 自承载 Integration Runtime 是一个组件,用于将数据从计算机上的 SQL Server 数据库复制到 Blob 存储。The self-hosted integration runtime is the component that copies data from the SQL Server database on your machine to Blob storage.

  8. 在“集成运行时安装”对话框中选择“自承载”,然后选择“继续”。 In the Integration Runtime Setup dialog box, select Self-Hosted, and then select Continue.

  9. 在“名称”下,输入 TutorialIntegrationRuntimeUnder name, enter TutorialIntegrationRuntime. 然后选择“创建”。Then select Create.

  10. 对于“设置”,选择“单击此处对此计算机启动快速安装”。For Settings, select Click here to launch the express setup for this computer. 此操作在计算机上安装集成运行时,并将其注册到数据工厂。This action installs the integration runtime on your machine and registers it with Data Factory. 或者,可以使用手动安装选项来下载安装文件、运行该文件,并使用密钥来注册集成运行时。Alternatively, you can use the manual setup option to download the installation file, run it, and use the key to register the integration runtime. 集成运行时安装Integration runtime setup

  11. 在过程完成时,在“Integration Runtime (自承载)快速安装”窗口中,选择“关闭”。 In the Integration Runtime (Self-hosted) Express Setup window, select Close when the process is finished.

    Integration Runtime(自承载)快速安装

  12. 在“新建链接服务(SQL Server)”对话框中,确认在“通过集成运行时连接”下选择了 TutorialIntegrationRuntimeIn the New linked service (SQL Server) dialog box, confirm that TutorialIntegrationRuntime is selected under Connect via integration runtime. 然后执行以下步骤:Then, take the following steps:

    a.a. 在“名称”下输入 SqlServerLinkedServiceUnder Name, enter SqlServerLinkedService.

    b.b. 在“服务器名称”下,输入 SQL Server 实例的名称。Under Server name, enter the name of your SQL Server instance.

    c.c. 在“数据库名称”下,输入包含 emp 表的数据库的名称。Under Database name, enter the name of the database with the emp table.

    d.d. 在“身份验证类型”下,选择数据工厂在连接到 SQL Server 数据库时会使用的相应身份验证类型。Under Authentication type, select the appropriate authentication type that Data Factory should use to connect to your SQL Server database.

    e.e. 在“用户名”和“密码”下,输入用户名和密码。 Under User name and Password, enter the user name and password. 如需在用户帐户或服务器名称中使用反斜杠 (\),请在其前面加上转义字符 (\)。If you need to use a backslash (\) in your user account or server name, precede it with the escape character (\). 例如,使用 mydomain\\myuserFor example, use mydomain\\myuser.

    f.f. 选择“测试连接”。Select Test connection. 执行此步骤是为了确认数据工厂是否可以使用已创建的自承载集成运行时连接到 SQL Server 数据库。This step is to confirm that Data Factory can connect to your SQL Server database by using the self-hosted integration runtime you created.

    g.g. 若要保存链接服务,请选择“创建”。To save the linked service, select Create.

    新建链接服务 (SQL Server)

  13. 创建链接服务后,将会返回到 SqlServerDataset 的“设置属性”页。After the linked service is created, you're back to the Set properties page for the SqlServerDataset. 执行以下步骤:Take the following steps:

    a.a. 在“链接服务”中,确认显示了“SqlServerLinkedService”。 In Linked service, confirm that you see SqlServerLinkedService.

    b.b. 在“表名称”下,选择 [dbo].[emp]Under Table name, select [dbo].[emp].

    c.c. 选择“确定”。Select OK.

  14. 转到包含 SQLServerToBlobPipeline 的选项卡,或在树状视图中选择“SQLServerToBlobPipeline”。 Go to the tab with SQLServerToBlobPipeline, or select SQLServerToBlobPipeline in the tree view.

  15. 转到“属性”窗口底部的“接收器”选项卡,选择“+ 新建”。 Go to the Sink tab at the bottom of the Properties window, and select + New.

  16. 在“新建数据集”对话框中,选择“Azure Blob 存储”, In the New Dataset dialog box, select Azure Blob Storage. 然后选择“继续”。Then select Continue.

  17. 在“选择格式”对话框中,选择数据的格式类型。In Select Format dialog box, choose the format type of your data. 然后选择“继续”。Then select Continue.

    数据格式选择

  18. 在“设置属性”对话框中,输入 AzureBlobDataset 作为名称。In the Set Properties dialog box, enter AzureBlobDataset for Name. 在“链接服务”文本框旁边,选择“+ 新建”。 Next to the Linked service text box, select + New.

  19. 在“新建链接服务(Azure Blob 存储)”对话框中,输入 AzureStorageLinkedService 作为名称,从“存储帐户名称”列表中选择你的存储帐户。 In the New Linked Service (Azure Blob Storage) dialog box, enter AzureStorageLinkedService as name, select your storage account from the Storage account name list. 测试连接,然后选择“创建”以部署该链接服务。Test connection, and then select Create to deploy the linked service.

  20. 创建链接服务后,将返回到“设置属性”页。After the linked service is created, you're back to the Set properties page. 选择“确定”。Select OK.

  21. 打开接收器数据集。Open the sink dataset. 在“连接”选项卡中执行以下步骤:On the Connection tab, take the following steps:

    a.a. 在“链接服务”中,确认选择了“AzureStorageLinkedService”。 In Linked service, confirm that AzureStorageLinkedService is selected.

    b.b. 文件路径中,输入 adfTutorial/fromonprem 作为“容器/目录”部分。In File path, enter adftutorial/fromonprem for the Container/ Directory part. 如果 adftutorial 容器中不包含 output 文件夹,数据工厂会自动创建 output 文件夹。If the output folder doesn't exist in the adftutorial container, Data Factory automatically creates the output folder.

    c.c. 对于“文件”部分,选择“添加动态内容”。For the File part, select Add dynamic content. 用于解析文件名的动态表达式dynamic expression for resolving file name

    d.d. 添加 @CONCAT(pipeline().RunId, '.txt'),然后选择“完成”。Add @CONCAT(pipeline().RunId, '.txt'), and then select Finish. 此操作会将文件重命名为 PipelineRunID.txt。This action will rename the file with PipelineRunID.txt.

  22. 转到已打开该管道的选项卡,或者在树状视图中选择该管道。Go to the tab with the pipeline opened, or select the pipeline in the tree view. 在“接收器数据集”中,确认选择了“AzureBlobDataset”。 In Sink Dataset, confirm that AzureBlobDataset is selected.

  23. 若要验证管道设置,请在工具栏中选择该管道对应的“验证”。To validate the pipeline settings, select Validate on the toolbar for the pipeline. 若要关闭管道验证输出,请选择 >> 图标。To close the Pipe validation output, select the >> icon. 验证管道validate pipeline

  24. 若要将创建的实体发布到数据工厂,请选择“全部发布”。To publish entities you created to Data Factory, select Publish all.

  25. 等到“发布完成”弹出消息出现。Wait until you see the Publishing completed pop-up. 若要检查发布状态,请选择窗口顶部的“显示通知”链接。To check the status of publishing, select the Show notifications link on the top of the window. 若要关闭通知窗口,请选择“关闭”。To close the notification window, select Close.

触发管道运行Trigger a pipeline run

在管道的工具栏上选择“添加触发器”,然后选择“立即触发”。Select Add Trigger on the toolbar for the pipeline, and then select Trigger Now.

监视管道运行Monitor the pipeline run

  1. 转到“监视”选项卡。此时会看到在上一步手动触发的管道。Go to the Monitor tab. You see the pipeline that you manually triggered in the previous step.

  2. 若要查看与管道运行关联的活动运行,请选择“管道名称”下的“SQLServerToBlobPipeline”链接。To view activity runs associated with the pipeline run, select the SQLServerToBlobPipeline link under PIPELINE NAME. 监视管道运行Monitor pipeline runs

  3. 在“活动运行”页面上,选择“详细信息”(眼镜图像)链接来查看有关复制操作的详细信息。On the Activity runs page, select the Details (eyeglasses image) link to see details about the copy operation. 若要回到“管道运行”视图,请选择顶部的“所有管道运行”。To go back to the Pipeline Runs view, select All pipeline runs at the top.

验证输出Verify the output

该管道自动在 adftutorial Blob 容器中创建名为 fromonprem 的输出文件夹。The pipeline automatically creates the output folder named fromonprem in the adftutorial blob container. 确认在输出文件夹中看到了 [pipeline().RunId].txt 文件。Confirm that you see the [pipeline().RunId].txt file in the output folder.

后续步骤Next steps

此示例中的管道将数据从 Blob 存储中的一个位置复制到另一个位置。The pipeline in this sample copies data from one location to another in Blob storage. 你已了解如何执行以下操作:You learned how to:

  • 创建数据工厂。Create a data factory.
  • 创建自我托管的集成运行时。Create a self-hosted integration runtime.
  • 创建 SQL Server 和存储链接服务。Create SQL Server and Storage linked services.
  • 创建 SQL Server 和 Blob 存储数据集。Create SQL Server and Blob storage datasets.
  • 创建包含复制活动的管道,用于移动数据。Create a pipeline with a copy activity to move the data.
  • 启动管道运行。Start a pipeline run.
  • 监视管道运行。Monitor the pipeline run.

有关数据工厂支持的数据存储的列表,请参阅支持的数据存储For a list of data stores that are supported by Data Factory, see Supported data stores.

若要了解如何将数据从源批量复制到目标,请继续学习以下教程:To learn how to copy data in bulk from a source to a destination, advance to the following tutorial: