快速入门:使用“复制数据”工具复制数据Quickstart: Use the Copy Data tool to copy data

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

在本快速入门中,我们将使用 Azure 门户创建一个数据工厂。In this quickstart, you use the Azure portal to create a data factory. 然后,使用“复制数据”工具创建一个管道,用于将数据从 Azure Blob 存储中的某个文件夹复制到另一个文件夹。Then, you use the Copy Data tool to create a pipeline that copies data from a folder in Azure Blob storage to another folder.

Note

如果你对 Azure 数据工厂不太熟悉,请在学习本快速入门之前参阅 Azure 数据工厂简介If you are new to Azure Data Factory, see Introduction to Azure Data Factory before doing this quickstart.

先决条件Prerequisites

Azure 订阅Azure subscription

如果没有 Azure 订阅,可在开始前创建一个 1 元人民币试用帐户If you don't have an Azure subscription, create a 1rmb trial account before you begin.

Azure 角色Azure roles

若要创建数据工厂实例,用于登录到 Azure 的用户帐户必须属于参与者或所有者角色,或者是 Azure 订阅的管理员。 To create Data Factory instances, the user account that you use to sign in to Azure must be a member of the contributor or owner role, or an administrator of the Azure subscription. 若要查看你在订阅中拥有的权限,请转到 Azure 门户,在右上角选择你的用户名,然后选择“...” 图标以显示更多选项,然后选择“我的权限” 。To view the permissions that you have in the subscription, go to the Azure portal, select your username in the upper-right corner, select "..." icon for more options, and then select My permissions. 如果可以访问多个订阅,请选择相应的订阅。If you have access to multiple subscriptions, select the appropriate subscription.

若要为数据工厂创建和管理子资源(包括数据集、链接服务、管道、触发器和集成运行时),以下要求适用:To create and manage child resources for Data Factory - including datasets, linked services, pipelines, triggers, and integration runtimes - the following requirements are applicable:

  • 若要在 Azure 门户中创建和管理子资源,你必须属于资源组级别或更高级别的数据工厂参与者角色。To create and manage child resources in the Azure portal, you must belong to the Data Factory Contributor role at the resource group level or above.
  • 若要使用 PowerShell 或 SDK 创建和管理子资源,资源级别或更高级别的参与者角色已足够。To create and manage child resources with PowerShell or the SDK, the contributor role at the resource level or above is sufficient.

有关如何将用户添加到角色的示例说明,请参阅添加角色一文。For sample instructions about how to add a user to a role, see the Add roles article.

有关详细信息,请参阅以下文章:For more info, see the following articles:

Azure 存储帐户Azure Storage account

在本快速入门中,使用常规用途的 Azure 存储帐户(具体的说就是 Blob 存储)作为源 和目标 数据存储。You use a general-purpose Azure Storage account (specifically Blob storage) as both source and destination data stores in this quickstart. 如果没有常规用途的 Azure 存储帐户,请参阅创建存储帐户创建一个。If you don't have a general-purpose Azure Storage account, see Create a storage account to create one.

获取存储帐户名称Get the storage account name

在本快速入门中,将需要 Azure 存储帐户的名称。You will need the name of your Azure Storage account for this quickstart. 以下过程提供的步骤用于获取存储帐户的名称:The following procedure provides steps to get the name of your storage account:

  1. 在 Web 浏览器中,转到 Azure 门户并使用你的 Azure 用户名和密码登录。In a web browser, go to the Azure portal and sign in using your Azure username and password.
  2. 从 Azure 门户菜单中,选择“所有服务”,然后选择“存储” > “存储帐户” 。From the Azure portal menu, select All services, then select Storage > Storage accounts. 此外,也可以在任何页面中搜索和选择“存储帐户” 。You can also search for and select Storage accounts from any page.
  3. 在“存储帐户”页中,筛选你的存储帐户(如果需要),然后选择它 。In the Storage accounts page, filter for your storage account (if needed), and then select your storage account.

此外,也可以在任何页面中搜索和选择“存储帐户” 。You can also search for and select Storage accounts from any page.

创建 Blob 容器Create a blob container

本部分介绍如何在 Azure Blob 存储中创建名为 adftutorial 的 Blob 容器。In this section, you create a blob container named adftutorial in Azure Blob storage.

  1. 在“存储帐户”页上,选择“概述” > “容器”。 From the storage account page, select Overview > Containers.

  2. 在 <Account name> - “容器”页的工具栏中,选择“容器” 。On the <Account name> - Containers page's toolbar, select Container.

  3. 在“新建容器” 对话框中,输入 adftutorial 作为名称,然后选择“确定” 。In the New container dialog box, enter adftutorial for the name, and then select OK. <Account name> - “容器”页已更新为在容器列表中包含“adftutorial” 。The <Account name> - Containers page is updated to include adftutorial in the list of containers.

    容器列表

为 Blob 容器添加输入文件夹和文件Add an input folder and file for the blob container

在此部分中,在刚创建的容器中创建名为“input”的文件夹,再将示例文件上传到 input 文件夹 。In this section, you create a folder named input in the container you just created, and then upload a sample file to the input folder. 在开始之前,打开文本编辑器(如记事本),并创建包含以下内容的名为“emp.txt”的文件 :Before you begin, open a text editor such as Notepad, and create a file named emp.txt with the following content:

John, Doe
Jane, Doe

将此文件保存在 C:\ADFv2QuickStartPSH 文件夹中 。Save the file in the C:\ADFv2QuickStartPSH folder. (如果此文件夹不存在,则创建它。)然后返回到 Azure 门户并执行以下步骤:(If the folder doesn't already exist, create it.) Then return to the Azure portal and follow these steps:

  1. 在上次离开的 <Account name> - “容器”页中,选择已更新的容器列表中的“adftutorial” 。In the <Account name> - Containers page where you left off, select adftutorial from the updated list of containers.

    1. 如果关闭了窗口或转到其他页,请再次登录到 Azure 门户If you closed the window or went to another page, sign in to the Azure portal again.
    2. 从 Azure 门户菜单中,选择“所有服务”,然后选择“存储” > “存储帐户” 。From the Azure portal menu, select All services, then select Storage > Storage accounts. 此外,也可以在任何页面中搜索和选择“存储帐户” 。You can also search for and select Storage accounts from any page.
    3. 选择存储帐户,然后选择“容器” > “adftutorial” 。Select your storage account, and then select Containers > adftutorial.
  2. 在“adftutorial”容器页面的工具栏上,选择“上传” 。On the adftutorial container page's toolbar, select Upload.

  3. 在“上传 Blob”页中,选择“文件”框,然后浏览到 emp.txt 文件并进行选择 。In the Upload blob page, select the Files box, and then browse to and select the emp.txt file.

  4. 展开“高级”标题 。Expand the Advanced heading. 此页现在显示如下内容:The page now displays as shown:

    选择“高级...”链接

  5. 在“上传到文件夹”框中,输入“输入” 。In the Upload to folder box, enter input.

  6. 选择“上传”按钮 。Select the Upload button. 应该会在列表中看到 emp.txt 文件和上传状态。You should see the emp.txt file and the status of the upload in the list.

  7. 选择“关闭”图标 (X) 以关闭“上传 Blob”页面 。Select the Close icon (an X) to close the Upload blob page.

让“adftutorial”容器页面保持打开状态 。Keep the adftutorial container page open. 在本快速入门结束时可以使用它来验证输出。You use it to verify the output at the end of this quickstart.

创建数据工厂Create a data factory

  1. 启动 Microsoft EdgeGoogle Chrome Web 浏览器。Launch Microsoft Edge or Google Chrome web browser. 目前,仅 Microsoft Edge 和 Google Chrome Web 浏览器支持数据工厂 UI。Currently, Data Factory UI is supported only in Microsoft Edge and Google Chrome web browsers.

  2. 转到 Azure 门户Go to the Azure portal.

  3. 在 Azure 门户菜单中,选择“创建资源” > “分析” > “数据工厂”: From the Azure portal menu, select Create a resource > Analytics > Data Factory:

    新建数据工厂

  4. 在“新建数据工厂”页中,输入 ADFTutorialDataFactory 作为名称On the New data factory page, enter ADFTutorialDataFactory for Name.

    Azure 数据工厂的名称必须 全局唯一The name of the Azure data factory must be globally unique. 如果出现以下错误,请更改数据工厂的名称(例如改为 <yourname>ADFTutorialDataFactory),并重新尝试创建。If you see the following error, change the name of the data factory (for example, <yourname>ADFTutorialDataFactory) and try creating again. 有关数据工厂项目的命名规则,请参阅数据工厂 - 命名规则一文。For naming rules for Data Factory artifacts, see the Data Factory - naming rules article.

    名称不可用时出错

  5. 对于“订阅”,请选择要在其中创建数据工厂的 Azure 订阅。For Subscription, select your Azure subscription in which you want to create the data factory.

  6. 对于“资源组”,请使用以下步骤之一:For Resource Group, use one of the following steps:

    • 选择“使用现有”,并从列表中选择现有的资源组。Select Use existing, and select an existing resource group from the list.
    • 选择“新建”,并输入资源组的名称。Select Create new, and enter the name of a resource group.

    若要了解有关资源组的详细信息,请参阅 使用资源组管理 Azure 资源To learn about resource groups, see Using resource groups to manage your Azure resources.

  7. 对于“版本”,选择“V2”。 For Version, select V2.

  8. 对于“位置”,请选择数据工厂所在的位置。For Location, select the location for the data factory.

    该列表仅显示数据工厂支持的位置,以及 Azure 数据工厂元数据要存储到的位置。The list shows only locations that Data Factory supports, and where your Azure Data Factory meta data will be stored. 数据工厂使用的关联数据存储(如 Azure 存储和 Azure SQL 数据库)和计算(如 Azure HDInsight)可以在其他区域中运行。The associated data stores (like Azure Storage and Azure SQL Database) and computes (like Azure HDInsight) that Data Factory uses can run in other regions.

  9. 选择“创建” 。Select Create.

  10. 创建完成后,会显示“数据工厂”页。After the creation is complete, you see the Data Factory page. 选择“创作和监视”磁贴,在单独的选项卡中启动 Azure 数据工厂用户界面 (UI) 应用程序。Select the Author & Monitor tile to start the Azure Data Factory user interface (UI) application on a separate tab.

    数据工厂的主页,其中包含“创作和监视”磁贴

启动“复制数据”工具Start the Copy Data tool

  1. 在“开始”页中,选择“复制数据”磁贴启动“复制数据”工具。 On the Let's get started page, select the Copy Data tile to start the Copy Data tool.

    “复制数据”磁贴

  2. 在“复制数据”工具的“属性”页上,可以指定管道的名称及其说明,然后选择“下一步”。On the Properties page of the Copy Data tool, you can specify a name for the pipeline and its description, then select Next.

    “属性”页

  3. 在“源数据存储”页上,完成以下步骤:On the Source data store page, complete the following steps:

    a.a. 单击“+ 创建新连接”,添加一个连接。Click + Create new connection to add a connection.

    b.b. 选择要创建的用于源连接的链接服务类型。Select the linked service type that you want to create for the source connection. 在本教程中,我们使用“Azure Blob 存储”。In this tutorial, we use Azure Blob Storage. 从库中选择它,然后选择“继续”。Select it from the gallery, and then select Continue.

    选择 Blob

    c.c. 在“新建链接服务(Azure Blob 存储)”页上,指定链接服务的名称。On the New Linked Service (Azure Blob Storage) page, specify a name for your linked service. 从“存储帐户名称”列表中选择存储帐户,测试连接,然后选择“创建”。Select your storage account from the Storage account name list, test connection, and then select Create.

    配置 Azure Blob 存储帐户

    d.d. 选择新创建的链接服务作为源,然后单击“下一步”。Select the newly created linked service as source, and then click Next.

  4. 在“选择输入文件或文件夹”页中完成以下步骤:On the Choose the input file or folder page, complete the following steps:

    a.a. 单击“浏览”导航到 adftutorial/input 文件夹,选择 emp.txt 文件,然后单击“选择”。Click Browse to navigate to the adftutorial/input folder, select the emp.txt file, and then click Choose.

    d.d. 选中“二进制复制”复选框以便按原样复制文件,然后选择“下一步”。Select the Binary copy checkbox to copy file as-is, and then select Next.

    “选择输入文件或文件夹”页

  5. 在“目标数据存储”页上,选择已创建的“Azure Blob 存储”链接服务,然后选择“下一步”。On the Destination data store page, select the Azure Blob Storage linked service you created, and then select Next.

  6. 在“选择输出文件或文件夹”页上输入 adftutorial/output 作为文件夹路径,然后选择“下一步”。 On the Choose the output file or folder page, enter adftutorial/output for the folder path, and then select Next.

    “选择输出文件或文件夹”页

  7. 在“设置”页中选择“下一步”,以便使用默认配置。 On the Settings page, select Next to use the default configurations.

  8. 在“摘要”页中查看所有设置,然后选择“下一步”。On the Summary page, review all settings, and select Next.

  9. 在“部署已完成”页中,选择“监视”可监视创建的管道。 On the Deployment complete page, select Monitor to monitor the pipeline that you created.

    “部署已完成”页

  10. 应用程序将切换到“监视”选项卡。可在此选项卡中查看管道的状态。选择“刷新”可刷新列表。The application switches to the Monitor tab. You see the status of the pipeline on this tab. Select Refresh to refresh the list. 单击“管道名称”下的链接,查看活动运行详细信息或重新运行管道。Click the link under PIPELINE NAME to view activity run details or rerun the pipeline.

    刷新管道

  11. 在“活动运行”页上,选择“活动名称”列下的“详细信息”链接(眼镜图标),以获取有关复制操作的更多详细信息。On the Activity runs page, select the Details link (eyeglasses icon) under the ACTIVITY NAME column for more details about copy operation. 有关属性的详细信息,请参阅复制活动概述For details about the properties, see Copy Activity overview.

  12. 若要回到“管道运行”视图,请选择痕迹导航菜单中的“所有管道运行”链接。To go back to the Pipeline Runs view, select the ALL pipeline runs link in the breadcrumb menu. 若要刷新视图,请选择“刷新”。To refresh the view, select Refresh.

  13. 验证 adftutorial 容器的 output 文件夹中是否创建了 emp.txt 文件。Verify that the emp.txt file is created in the output folder of the adftutorial container. 如果 output 文件夹不存在,数据工厂服务会自动创建它。If the output folder doesn't exist, the Data Factory service automatically creates it.

  14. 切换到左面板的“监视”选项卡上的“创作”选项卡,以便编辑链接服务、数据集和管道。 Switch to the Author tab above the Monitor tab on the left panel so that you can edit linked services, datasets, and pipelines. 若要了解如何在数据工厂 UI 中编辑这些实体,请参阅使用 Azure 门户创建数据工厂To learn about editing them in the Data Factory UI, see Create a data factory by using the Azure portal.

    选择“创作”选项卡

后续步骤Next steps

此示例中的管道将数据从 Azure Blob 存储中的一个位置复制到另一个位置。The pipeline in this sample copies data from one location to another location in Azure Blob storage. 若要了解如何在更多方案中使用数据工厂,请完成相关教程To learn about using Data Factory in more scenarios, go through the tutorials.