使用“复制数据”工具将数据从 SQL Server 数据库复制到 Azure Blob 存储Copy data from a SQL Server database to Azure Blob storage by using the Copy Data tool

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

在本教程中,我们将使用 Azure 门户创建数据工厂。In this tutorial, you use the Azure portal to create a data factory. 然后,使用“复制数据”工具创建一个管道,用于将数据从 SQL Server 数据库复制到 Azure Blob 存储。Then, you use the Copy Data tool to create a pipeline that copies data from a SQL Server database to Azure Blob storage.

备注

将在本教程中执行以下步骤:In this tutorial, you perform the following steps:

  • 创建数据工厂。Create a data factory.
  • 使用“复制数据”工具创建管道。Use the Copy Data tool to create a pipeline.
  • 监视管道和活动运行。Monitor the pipeline and activity runs.

先决条件Prerequisites

Azure 订阅Azure subscription

如果还没有 Azure 订阅,请在开始前创建一个 1 元试用帐户Before you begin, if you don't already have an Azure subscription, create a 1rmb trial account.

Azure 角色Azure roles

若要创建数据工厂实例,用于登录到 Azure 的用户帐户必须分配有“参与者”或“所有者”角色,或者必须是 Azure 订阅的管理员。To create data factory instances, the user account you use to log in to Azure must be assigned a Contributor or Owner role or must be an administrator of the Azure subscription.

若要查看自己在订阅中的权限,请转到 Azure 门户。To view the permissions you have in the subscription, go to the Azure portal. 在右上角选择自己的用户名,然后选择“权限”。Select your user name in the upper-right corner, and then select Permissions. 如果可以访问多个订阅,请选择相应的订阅。If you have access to multiple subscriptions, select the appropriate subscription. 有关如何将用户添加到角色的示例说明,请参阅使用 RBAC 和 Azure 门户管理访问权限For sample instructions on how to add a user to a role, see Manage access using RBAC and the Azure portal.

SQL Server 2014、2016 和 2017SQL Server 2014, 2016, and 2017

在本教程中,需将 SQL Server 数据库用作源数据存储。In this tutorial, you use a SQL Server database as a source data store. 在本教程中创建的数据工厂中的管道将数据从这个 SQL Server 数据库(源)复制到 Blob 存储(接收器)。The pipeline in the data factory you create in this tutorial copies data from this SQL Server database (source) to Blob storage (sink). 然后,你可以在 SQL Server 数据库中创建名为 emp 的表,并向表中插入几个示例条目。You then create a table named emp in your SQL Server database and insert a couple of sample entries into the table.

  1. 启动 SQL Server Management Studio。Start SQL Server Management Studio. 如果此软件尚未安装在计算机上,请访问下载 SQL Server Management StudioIf it's not already installed on your machine, go to Download SQL Server Management Studio.

  2. 使用凭据连接到 SQL Server 实例。Connect to your SQL Server instance by using your credentials.

  3. 创建示例数据库。Create a sample database. 在树状视图中右键单击“数据库”,然后选择“新建数据库”。 In the tree view, right-click Databases, and then select New Database.

  4. 在“新建数据库”窗口中输入数据库的名称,然后选择“确定”。In the New Database window, enter a name for the database, and then select OK.

  5. 若要创建 emp 表并将一些示例数据插入到其中,请对数据库运行以下查询脚本。To create the emp table and insert some sample data into it, run the following query script against the database. 在树状视图中右键单击所创建的数据库,然后选择“新建查询”。In the tree view, right-click the database that you created, and then select New Query.

    CREATE TABLE dbo.emp
    (
        ID int IDENTITY(1,1) NOT NULL,
        FirstName varchar(50),
        LastName varchar(50)
    )
    GO
    
    INSERT INTO emp (FirstName, LastName) VALUES ('John', 'Doe')
    INSERT INTO emp (FirstName, LastName) VALUES ('Jane', 'Doe')
    GO
    

Azure 存储帐户Azure storage account

在本教程中,请使用常规用途的 Azure 存储帐户(具体说来就是 Blob 存储)作为目标/接收器数据存储。In this tutorial, you use a general-purpose Azure storage account (specifically, Blob storage) as a destination/sink data store. 如果没有常规用途的存储帐户,请参阅创建存储帐户中的创建说明。If you don't have a general-purpose storage account, see Create a storage account for instructions to create one. 在本教程中创建的数据工厂中的管道将数据从 SQL Server 数据库(源)复制到此 Blob 存储(接收器)。The pipeline in the data factory you that create in this tutorial copies data from the SQL Server database (source) to this Blob storage (sink).

获取存储帐户名称和帐户密钥Get the storage account name and account key

在本教程中,请使用存储帐户的名称和密钥。You use the name and key of your storage account in this tutorial. 若要获取存储帐户的名称和密钥,请执行以下步骤:To get the name and key of your storage account, take the following steps:

  1. 使用 Azure 用户名和密码登录到 Azure 门户Sign in to the Azure portal with your Azure user name and password.

  2. 在左窗格中,选择“所有服务”。In the left pane, select All services. 使用“存储”关键字进行筛选,然后选择“存储帐户”。 Filter by using the Storage keyword, and then select Storage accounts.

    存储帐户搜索

  3. 在存储帐户列表中,根据需要通过筛选找出你的存储帐户,In the list of storage accounts, filter for your storage account, if needed. 然后选择存储帐户。Then select your storage account.

  4. 在“存储帐户”窗口中选择“访问密钥” 。In the Storage account window, select Access keys.

  5. 复制“存储帐户名称”和“key1”框中的值,然后将其粘贴到记事本或其他编辑器中,在教程的后面部分使用。 In the Storage account name and key1 boxes, copy the values, and then paste them into Notepad or another editor for later use in the tutorial.

创建 adftutorial 容器Create the adftutorial container

此部分在 Blob 存储中创建名为 adftutorial 的 Blob 容器。In this section, you create a blob container named adftutorial in your Blob storage.

  1. 在“存储帐户”窗口中切换到“概览”,然后选择“Blob”。In the Storage account window, switch to Overview, and then select Blobs.

  2. 在“Blob”窗口中,选择“+ 容器”。In the Blobs window, select + Container.

  3. 在“新建容器”窗口中的“名称”下输入“adftutorial”,然后选择“确定”。 In the New container window, under Name, enter adftutorial, and then select OK.

  4. 在容器列表中选择“adftutorial”。In the list of containers, select adftutorial.

  5. adftutorial 的“容器”窗口保持打开状态。Keep the Container window for adftutorial open. 在教程结束时使用它来验证输出。You use it to verify the output at the end of the tutorial. 数据工厂自动在该容器中创建输出文件夹,因此不需要你来创建。Data Factory automatically creates the output folder in this container, so you don't need to create one.

创建数据工厂Create a data factory

  1. 在左侧菜单中,选择“+ 创建资源” > “数据 + 分析” > “数据工厂”。On the menu on the left, select + Create a resource > Data + Analytics > Data Factory.

    新建数据工厂

  2. 在“新建数据工厂”页的“名称”下输入 ADFTutorialDataFactoryOn the New data factory page, under Name, enter ADFTutorialDataFactory.

    数据工厂的名称必须全局唯一。The name of the data factory must be globally unique. 如果看到名称字段的以下错误消息,请更改数据工厂的名称(例如,改为 yournameADFTutorialDataFactory)。If you see the following error message for the name field, change the name of the data factory (for example, yournameADFTutorialDataFactory). 有关数据工厂项目的命名规则,请参阅数据工厂命名规则For naming rules for Data Factory artifacts, see Data Factory naming rules.

    新建数据工厂名称

  3. 选择要在其中创建数据工厂的 Azure 订阅Select the Azure subscription in which you want to create the data factory.

  4. 对于“资源组”,请执行以下步骤之一:For Resource Group, take one of the following steps:

    • 选择“使用现有资源组”,并从下拉列表选择现有的资源组。Select Use existing, and select an existing resource group from the drop-down list.

    • 选择“新建”,并输入资源组的名称。Select Create new, and enter the name of a resource group.

      若要了解资源组,请参阅使用资源组管理 Azure 资源To learn about resource groups, see Use resource groups to manage your Azure resources.

  5. 在“版本”下选择“V2”。 Under Version, select V2.

  6. 在“位置”下选择数据工厂的位置。Under Location, select the location for the data factory. 下拉列表中仅显示支持的位置。Only locations that are supported are displayed in the drop-down list. 数据工厂使用的数据存储(例如,Azure 存储和 SQL 数据库)和计算资源(例如,Azure HDInsight)可以位于其他位置/区域。The data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) used by Data Factory can be in other locations/regions.

  7. 选择“创建” 。Select Create.

  8. 创建完以后,会看到图中所示的“数据工厂”页:After the creation is finished, you see the Data Factory page as shown in the image.

    数据工厂主页

  9. 选择“创作和监视”,在单独的选项卡中启动数据工厂用户界面。Select Author & Monitor to launch the Data Factory user interface in a separate tab.

使用“复制数据”工具创建管道Use the Copy Data tool to create a pipeline

  1. 在“开始使用”页中选择“复制数据”,启动“复制数据”工具 。On the Let's get started page, select Copy Data to launch the Copy Data tool.

    “入门”页

  2. 在“复制数据”工具的“属性”页的“任务名称”下,输入 CopyFromOnPremSqlToAzureBlobPipelineOn the Properties page of the Copy Data tool, under Task name, enter CopyFromOnPremSqlToAzureBlobPipeline. 然后,选择“下一步”。Then select Next. “复制数据”工具将使用在此字段中指定的名称创建一个管道。The Copy Data tool creates a pipeline with the name you specify for this field. 任务名称Task name

  3. 在“源数据存储”页面上,单击“创建新连接”。On the Source data store page, click on Create new connection.

  4. 在“新建链接服务”下,搜索“SQL Server”,然后选择“继续”。Under New Linked Service, search for SQL Server, and then select Continue.

  5. 在“新建链接服务(SQL Server)”对话框中的“名称”下输入“SqlServerLinkedService” 。In the New Linked Service (SQL Server) dialog box, under Name, enter SqlServerLinkedService. 在“通过集成运行时连接”下选择“+新建”。 Select +New under Connect via integration runtime. 必须创建自承载的 Integration Runtime,将其下载到计算机,然后将其注册到数据工厂。You must create a self-hosted integration runtime, download it to your machine, and register it with Data Factory. 自承载的 Integration Runtime 时可在本地环境与云之间复制数据。The self-hosted integration runtime copies data between your on-premises environment and the cloud.

  6. 在“集成运行时安装”对话框中选择“自承载”。 In the Integration Runtime Setup dialog box, Select Self-Hosted. 然后选择“继续”。Then select Continue.

    创建集成运行时

  7. 在“集成运行时安装”对话框中的“名称”下输入 TutorialIntegrationRuntimeIn the Integration Runtime Setup dialog box, under Name, enter TutorialIntegrationRuntime. 然后选择“创建”。Then select Create.

  8. 在“集成运行时安装”对话框中,选择“单击此处对此计算机启动快速安装”。 In the Integration Runtime Setup dialog box, select Click here to launch the express setup for this computer. 此操作在计算机上安装集成运行时,并将其注册到数据工厂。This action installs the integration runtime on your machine and registers it with Data Factory. 或者,可以使用手动安装选项来下载安装文件、运行该文件,并使用密钥来注册集成运行时。Alternatively, you can use the manual setup option to download the installation file, run it, and use the key to register the integration runtime.

  9. 运行下载的应用程序。Run the downloaded application. 窗口中会显示快速安装的状态。You see the status of the express setup in the window.

    快速安装状态

  10. 在“新建链接服务(SQL Server)”对话框中,确认为“集成运行时”字段选择了 TutorialIntegrationRuntimeIn the New Linked Service (SQL Server) dialog box, confirm that TutorialIntegrationRuntime is selected for the Integration Runtime field. 然后执行以下步骤:Then, take the following steps:

    a.a. 在“名称”下输入 SqlServerLinkedServiceUnder Name, enter SqlServerLinkedService.

    b.b. 在“服务器名称”下,输入 SQL Server 实例的名称。Under Server name, enter the name of your SQL Server instance.

    c.c. 在“数据库名称”下输入本地数据库的名称。Under Database name, enter the name of your on-premises database.

    d.d. 在“身份验证类型”下选择适当的身份验证。Under Authentication type, select appropriate authentication.

    e.e. 在“用户名”下输入有权访问 SQL Server 的用户名。Under User name, enter the name of user with access to SQL Server.

    f.f. 输入该用户的密码Enter the password for the user.

    g.g. 测试连接并选择“完成”。Test connection and select Finish.

    选择的集成运行时

  11. 在“源数据存储”页中选择“下一步” 。On the Source data store page, select Next.

  12. 在“选择要从中复制数据的表或使用自定义查询”页中,从列表中选择“[dbo].[emp]”表,然后选择“下一步”。 On the Select tables from which to copy the data or use a custom query page, select the [dbo].[emp] table in the list, and select Next. 可以根据你的数据库选择任何其他表。You can select any other table based on your database.

  13. 在“目标数据存储”页面上,选择“创建新连接”On the Destination data store page, select Create new connection

  14. 在“新建链接服务”中,搜索并选择“Azure Blob”,然后选择“继续”。In New Linked Service, Search and Select Azure Blob, and then select Continue.

    Blob 存储选择

  15. 在“新建链接服务(Azure Blob 存储)”对话框中,执行以下步骤:On the New Linked Service (Azure Blob Storage) dialog, take the following steps:

    a.a. 在“名称”下输入 AzureStorageLinkedServiceUnder Name, enter AzureStorageLinkedService.

    b.b. 在“通过集成运行时连接”下,选择 TutorialIntegrationRuntimeUnder Connect via integration runtime, select TutorialIntegrationRuntime

    c.c. 在“存储帐户名称”下的下拉列表中选择自己的存储帐户。Under Storage account name, select your storage account from the drop-down list.

    d.d. 选择“完成”。Select Finish.

  16. 在“目标数据存储”对话框中,确保选择了“Azure Blob 存储”。In Destination data store dialog, make sure that Azure Blob Storage is selected. 然后,选择“下一步”。Then select Next.

  17. 在“选择输出文件或文件夹”对话框中,在“文件夹路径”下,输入 adftutorial/fromonpremIn the Choose the output file or folder dialog, under Folder path, enter adftutorial/fromonprem. 在执行先决条件中的步骤时,你已创建了 adftutorial 容器。You created the adftutorial container as part of the prerequisites. 如果输出文件夹(在本例中为 fromonprem)不存在,则数据工厂会自动创建它。If the output folder doesn't exist (in this case fromonprem), Data Factory automatically creates it. 也可以使用“浏览”按钮来浏览 Blob 存储及其容器/文件夹。You can also use the Browse button to browse the blob storage and its containers/folders. 如果没有在“文件名”下指定任何值,则默认情况下将使用源中的名称(在本例中为 dbo.emp)。If you do not specify any value under File name, by default the name from the source would be used (in this case dbo.emp).

    选择输出文件或文件夹

  18. 在“文件格式设置”对话框中,选择“下一步”。 On the File format settings dialog, select Next.

  19. 在“设置”对话框中,选择“下一步”。 On the Settings dialog, select Next.

  20. 在“摘要”对话框中,复查所有设置的值,然后选择“下一步”。 On the Summary dialog, review values for all the settings, and select Next.

  21. 在“部署”页中,选择“监视”可以监视管道(任务) 。On the Deployment page, select Monitor to monitor the pipeline (task).

  22. 管道运行完成后,可以查看所创建的管道的状态。When the pipeline run completes, you can view the status of the pipeline you created.

  23. 在“管道运行”页上,选择“刷新”来刷新列表。On the Pipeline runs page, select Refresh to refresh the list. 单击“管道名称”下的链接,查看活动运行详细信息或重新运行管道。Click the link under PIPELINE NAME to view activity run details or rerun the pipeline.

  24. 在“活动运行”页上,选择“活动名称”列下的“详细信息”链接(眼镜图标),以获取有关复制操作的更多详细信息。On the Activity runs page, select the Details link (eyeglasses icon) under the ACTIVITY NAME column for more details about copy operation. 若要回到“管道运行”视图,请选择痕迹导航菜单中的“所有管道运行”链接。To go back to the Pipeline Runs view, select the ALL pipeline runs link in the breadcrumb menu. 若要刷新视图,请选择“刷新”。To refresh the view, select Refresh.

  25. 确认可以在 adftutorial 容器的 fromonprem 文件夹中看到输出文件。Confirm that you see the output file in the fromonprem folder of the adftutorial container.

  26. 选择左侧的“编辑”选项卡可以切换到编辑器模式。Select the Edit tab on the left to switch to the editor mode. 可以使用编辑器来更新该工具创建的链接服务、数据集和管道。You can update the linked services, datasets, and pipelines created by the tool by using the editor. 选择“代码”可以查看与编辑器中打开的实体相关联的 JSON 代码。Select Code to view the JSON code associated with the entity opened in the editor. 有关如何在数据工厂 UI 中编辑这些实体的详细信息,请参阅此教程的 Azure 门户版本For details on how to edit these entities in the Data Factory UI, see the Azure portal version of this tutorial.

后续步骤Next steps

此示例中的管道可将数据从 SQL Server 数据库复制到 Blob 存储。The pipeline in this sample copies data from a SQL Server database to Blob storage. 你已了解如何:You learned how to:

  • 创建数据工厂。Create a data factory.
  • 使用“复制数据”工具创建管道。Use the Copy Data tool to create a pipeline.
  • 监视管道和活动运行。Monitor the pipeline and activity runs.

有关数据工厂支持的数据存储的列表,请参阅支持的数据存储For a list of data stores that are supported by Data Factory, see Supported data stores.

若要了解如何将数据从源批量复制到目标,请继续学习以下教程:To learn about how to copy data in bulk from a source to a destination, advance to the following tutorial: