教程:将数据从 SQL Server 数据库复制到 Azure Blob 存储Tutorial: Copy data from a SQL Server database to Azure Blob storage

适用于: Azure 数据工厂 Azure Synapse Analytics(预览版)

本教程使用 Azure PowerShell 创建一个数据工厂管道,用于将数据从 SQL Server 数据库复制到 Azure Blob 存储。In this tutorial, you use Azure PowerShell to create a data-factory pipeline that copies data from a SQL Server database to Azure Blob storage. 同时创建一个自承载 Integration Runtime,用其在本地数据存储和云数据存储之间移动数据。You create and use a self-hosted integration runtime, which moves data between on-premises and cloud data stores.

备注

本文不提供数据工厂服务的详细介绍。This article does not provide a detailed introduction to the Data Factory service. 有关详细信息,请参阅 Azure 数据工厂简介For more information, see Introduction to Azure Data Factory.

将在本教程中执行以下步骤:In this tutorial, you perform the following steps:

  • 创建数据工厂。Create a data factory.
  • 创建自我托管的集成运行时。Create a self-hosted integration runtime.
  • 创建 SQL Server 和 Azure 存储链接服务。Create SQL Server and Azure Storage linked services.
  • 创建 SQL Server 和 Azure Blob 数据集。Create SQL Server and Azure Blob datasets.
  • 创建包含复制活动的管道,用于移动数据。Create a pipeline with a copy activity to move the data.
  • 启动管道运行。Start a pipeline run.
  • 监视管道运行。Monitor the pipeline run.

先决条件Prerequisites

Azure 订阅Azure subscription

如果还没有 Azure 订阅,请在开始前创建一个试用帐户Before you begin, if you don't already have an Azure subscription, create a trial account.

Azure 角色Azure roles

若要创建数据工厂实例,用于登录到 Azure 的用户帐户必须分配有“参与者”或“所有者”角色,或者必须是 Azure 订阅的管理员。To create data factory instances, the user account you use to sign in to Azure must be assigned a Contributor or Owner role or must be an administrator of the Azure subscription.

若要查看你在订阅中拥有的权限,请转到 Azure 门户,选择右上角的用户名,然后选择“权限”。To view the permissions you have in the subscription, go to the Azure portal, select your username at the top-right corner, and then select Permissions . 如果可以访问多个订阅,请选择相应的订阅。If you have access to multiple subscriptions, select the appropriate subscription. 有关将用户添加到角色的示例说明,请参阅使用 Azure 门户添加或删除 Azure 角色分配一文。For sample instructions on adding a user to a role, see the Add or remove Azure role assignments using the Azure portal article.

SQL Server 2014、2016 和 2017SQL Server 2014, 2016, and 2017

在本教程中,需将 SQL Server 数据库用作源数据存储。In this tutorial, you use a SQL Server database as a source data store. 在本教程中创建的数据工厂中的管道将数据从这个 SQL Server 数据库(源)复制到 Azure Blob 存储(接收器)。The pipeline in the data factory you create in this tutorial copies data from this SQL Server database (source) to Azure Blob storage (sink). 然后,你可以在 SQL Server 数据库中创建名为 emp 的表,并向表中插入几个示例条目。You then create a table named emp in your SQL Server database, and insert a couple of sample entries into the table.

  1. 启动 SQL Server Management Studio。Start SQL Server Management Studio. 如果此软件尚未安装在计算机上,请访问下载 SQL Server Management StudioIf it is not already installed on your machine, go to Download SQL Server Management Studio.

  2. 使用凭据连接到 SQL Server 实例。Connect to your SQL Server instance by using your credentials.

  3. 创建示例数据库。Create a sample database. 在树状视图中右键单击“数据库”,然后选择“新建数据库”。 In the tree view, right-click Databases , and then select New Database .

  4. 在“新建数据库”窗口中输入数据库的名称,然后选择“确定”。In the New Database window, enter a name for the database, and then select OK .

  5. 若要创建 emp 表并将一些示例数据插入到其中,请对数据库运行以下查询脚本。To create the emp table and insert some sample data into it, run the following query script against the database. 在树状视图中右键单击所创建的数据库,然后选择“新建查询”。In the tree view, right-click the database that you created, and then select New Query .

    CREATE TABLE dbo.emp
    (
        ID int IDENTITY(1,1) NOT NULL,
        FirstName varchar(50),
        LastName varchar(50)
    )
    GO
    
    INSERT INTO emp (FirstName, LastName) VALUES ('John', 'Doe')
    INSERT INTO emp (FirstName, LastName) VALUES ('Jane', 'Doe')
    GO
    

Azure 存储帐户Azure Storage account

在本教程中,请使用常规用途的 Azure 存储帐户(具体说来就是 Azure Blob 存储)作为目标/接收器数据存储。In this tutorial, you use a general-purpose Azure storage account (specifically, Azure Blob storage) as a destination/sink data store. 如果没有常规用途的 Azure 存储帐户,请参阅创建存储帐户If you don't have a general-purpose Azure storage account, see Create a storage account. 在本教程中创建的数据工厂中的管道将数据从 SQL Server 数据库(源)复制到此 Azure Blob 存储(接收器)。The pipeline in the data factory you that create in this tutorial copies data from the SQL Server database (source) to this Azure Blob storage (sink).

获取存储帐户名称和帐户密钥Get storage account name and account key

在本教程中,请使用 Azure 存储帐户的名称和密钥。You use the name and key of your Azure storage account in this tutorial. 通过以下步骤获取存储帐户的名称和密钥:Get the name and key of your storage account by doing the following:

  1. 使用 Azure 用户名和密码登录到 Azure 门户Sign in to the Azure portal with your Azure username and password.

  2. 在左窗格中选择“更多服务”,使用“存储”关键字进行筛选,然后选择“存储帐户”。 In the left pane, select More services , filter by using the Storage keyword, and then select Storage accounts .

    搜索存储帐户

  3. 在存储帐户列表中,通过筛选找出你的存储帐户(如果需要),然后选择你的存储帐户。In the list of storage accounts, filter for your storage account (if needed), and then select your storage account.

  4. 在“存储帐户”窗口中选择“访问密钥” 。In the Storage account window, select Access keys .

  5. 复制“存储帐户名称”和“key1”框中的值,然后将其粘贴到记事本或其他编辑器中,在教程的后面部分使用。 In the Storage account name and key1 boxes, copy the values, and then paste them into Notepad or another editor for later use in the tutorial.

创建 adftutorial 容器Create the adftutorial container

此部分在 Azure Blob 存储中创建名为 adftutorial 的 Blob 容器。In this section, you create a blob container named adftutorial in your Azure Blob storage.

  1. 在“存储帐户”窗口中切换到“概览”,然后选择“Blob”。In the Storage account window, switch to Overview , and then select Blobs .

    选择“Blob”选项

  2. 在“Blob 服务”窗口中,选择“容器”。In the Blob service window, select Container .

  3. 在“新建容器”窗口的“名称”框中输入“adftutorial”,然后选择“确定”。 In the New container window, in the Name box, enter adftutorial , and then select OK .

    输入容器名称

  4. 在容器列表中选择“adftutorial”。In the list of containers, select adftutorial .

  5. adftutorial 的“容器”窗口保持打开状态。Keep the container window for adftutorial open. 在教程结束时可以使用它来验证输出。You use it verify the output at the end of the tutorial. 数据工厂自动在该容器中创建输出文件夹,因此不需要你来创建。Data Factory automatically creates the output folder in this container, so you don't need to create one.

Windows PowerShellWindows PowerShell

安装 Azure PowerShellInstall Azure PowerShell

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

安装最新版的 Azure PowerShell(如果尚未在计算机上安装)。Install the latest version of Azure PowerShell if you don't already have it on your machine. 有关详细信息,请参阅如何安装和配置 Azure PowerShellFor detailed instructions, see How to install and configure Azure PowerShell.

登录到 PowerShellLog in to PowerShell

  1. 在计算机上启动 PowerShell,让其保持打开状态,直至本快速入门教程完成。Start PowerShell on your machine, and keep it open through completion of this quickstart tutorial. 如果将它关闭再重新打开,则需再次运行这些命令。If you close and reopen it, you'll need to run these commands again.

  2. 运行以下命令,然后输入用于登录 Azure 门户的 Azure 用户名和密码:Run the following command, and then enter the Azure username and password that you use to sign in to the Azure portal:

    Connect-AzAccount -Environment AzureChinaCloud
    
  3. 如果有多个 Azure 订阅,请运行以下命令,选择要使用的订阅。If you have multiple Azure subscriptions, run the following command to select the subscription that you want to work with. 请将 SubscriptionId 替换为自己的 Azure 订阅的 ID:Replace SubscriptionId with the ID of your Azure subscription:

    Select-AzSubscription -SubscriptionId "<SubscriptionId>"    
    

创建数据工厂Create a data factory

  1. 为资源组名称定义一个变量,稍后会在 PowerShell 命令中使用该变量。Define a variable for the resource group name that you'll use later in PowerShell commands. 将以下命令复制到 PowerShell,指定 Azure 资源组的名称(包括在双引号中,例如 "adfrg"),然后运行命令。Copy the following command to PowerShell, specify a name for the Azure resource group (enclosed in double quotation marks; for example, "adfrg"), and then run the command.

    $resourceGroupName = "ADFTutorialResourceGroup"
    
  2. 若要创建 Azure 资源组,请运行以下命令:To create the Azure resource group, run the following command:

    New-AzResourceGroup $resourceGroupName -location 'China East 2'
    

    如果该资源组已存在,请勿覆盖它。If the resource group already exists, you may not want to overwrite it. $resourceGroupName 变量分配另一个值,然后再次运行命令。Assign a different value to the $resourceGroupName variable and run the command again.

  3. 为数据工厂名称定义一个变量,稍后可在 PowerShell 命令中使用该变量。Define a variable for the data factory name that you can use in PowerShell commands later. 名称必须以字母或数字开头,并且只能包含字母、数字和短划线 (-) 字符。The name must start with a letter or a number, and it can contain only letters, numbers, and the dash (-) character.

    重要

    使用全局唯一的名称更新数据工厂名称。Update the data factory name with a globally unique name. 例如 ADFTutorialFactorySP1127。An example is ADFTutorialFactorySP1127.

    $dataFactoryName = "ADFTutorialFactory"
    
  4. 定义一个用于数据工厂位置的变量:Define a variable for the location of the data factory:

    $location = "China East 2"
    
  5. 若要创建数据工厂,请运行以下 Set-AzDataFactoryV2 cmdlet:To create the data factory, run the following Set-AzDataFactoryV2 cmdlet:

    Set-AzDataFactoryV2 -ResourceGroupName $resourceGroupName -Location $location -Name $dataFactoryName
    

备注

  • 数据工厂的名称必须全局唯一。The name of the data factory must be globally unique. 如果收到以下错误,请更改名称并重试。If you receive the following error, change the name and try again.
    The specified data factory name 'ADFv2TutorialDataFactory' is already in use. Data factory names must be globally unique.
    
  • 若要创建数据工厂实例,用于登录到 Azure 的用户帐户必须具有参与者或所有者角色,或者是 Azure 订阅的管理员。To create data-factory instances, the user account that you use to sign in to Azure must be assigned a contributor or owner role or must be an administrator of the Azure subscription.
  • 若要查看目前提供数据工厂的 Azure 区域的列表,请在以下页面上选择感兴趣的区域,然后展开“分析”以找到“数据工厂”:可用产品(按区域)For a list of Azure regions in which Data Factory is currently available, select the regions that interest you on the following page, and then expand Analytics to locate Data Factory : Products available by region. 数据工厂使用的数据存储(Azure 存储、Azure SQL 数据库等)和计算资源(Azure HDInsight 等)可以位于其他区域。The data stores (Azure Storage, Azure SQL Database, and so on) and computes (Azure HDInsight and so on) used by the data factory can be in other regions.

创建自承载 Integration RuntimeCreate a self-hosted integration runtime

在本部分,请创建一个自承载 Integration Runtime,然后将其与安装了 SQL Server 数据库的本地计算机相关联。In this section, you create a self-hosted integration runtime and associate it with an on-premises machine with the SQL Server database. 自承载 Integration Runtime 是一个组件,用于将数据从计算机上的 SQL Server 数据库复制到 Azure Blob 存储。The self-hosted integration runtime is the component that copies data from the SQL Server database on your machine to Azure Blob storage.

  1. 创建一个适用于 Integration Runtime 名称的变量。Create a variable for the name of integration runtime. 使用唯一名称,并记下该名称。Use a unique name, and note the name. 本教程后面部分需要使用它。You use it later in this tutorial.

    $integrationRuntimeName = "ADFTutorialIR"
    
  2. 创建自我托管的集成运行时。Create a self-hosted integration runtime.

    Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $integrationRuntimeName -Type SelfHosted -Description "selfhosted IR description"
    

    下面是示例输出:Here is the sample output:

    Name              : ADFTutorialIR
    Type              : SelfHosted
    ResourceGroupName : <resourceGroupName>
    DataFactoryName   : <dataFactoryName>
    Description       : selfhosted IR description
    Id                : /subscriptions/<subscription ID>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<dataFactoryName>/integrationruntimes/<integrationRuntimeName>
    
  3. 若要检索所创建的 Integration Runtime 的状态,请运行以下命令:To retrieve the status of the created integration runtime, run the following command:

    Get-AzDataFactoryV2IntegrationRuntime -name $integrationRuntimeName -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Status
    

    下面是示例输出:Here is the sample output:

    State                     : NeedRegistration
    Version                   :
    CreateTime                : 9/10/2019 3:24:09 AM
    AutoUpdate                : On
    ScheduledUpdateDate       :
    UpdateDelayOffset         :
    LocalTimeZoneOffset       :
    InternalChannelEncryption :
    Capabilities              : {}
    ServiceUrls               : {chinae2.frontend.datamovement.azure.cn}
    Nodes                     : {}
    Links                     : {}
    Name                      : <Integration Runtime name>
    Type                      : SelfHosted
    ResourceGroupName         : <resourceGroup name>
    DataFactoryName           : <dataFactory name>
    Description               : selfhosted IR description
    Id                        : /subscriptions/<subscription ID>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<dataFactoryName>/integrationruntimes/<integrationRuntimeName>
    
  4. 若要检索可将自承载 Integration Runtime 注册到云中数据工厂服务的身份验证密钥,请运行以下命令。To retrieve the authentication keys for registering the self-hosted integration runtime with the Data Factory service in the cloud, run the following command. 复制其中一个密钥(去除引号),以便注册将在下一步安装到计算机上的自承载 Integration Runtime。Copy one of the keys (excluding the quotation marks) for registering the self-hosted integration runtime that you install on your machine in the next step.

    Get-AzDataFactoryV2IntegrationRuntimeKey -Name $integrationRuntimeName -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName | ConvertTo-Json
    

    下面是示例输出:Here is the sample output:

    {
        "AuthKey1":  "IR@0000000000-0000-0000-0000-000000000000@xy0@xy@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=",
        "AuthKey2":  "IR@0000000000-0000-0000-0000-000000000000@xy0@xy@yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy="
    }
    

安装 Integration RuntimeInstall the integration runtime

  1. Azure 数据工厂 Integration Runtime 下载到本地 Windows 计算机上,然后进行安装。Download Azure Data Factory Integration Runtime on a local Windows machine, and then run the installation.

  2. 在“欢迎使用 Microsoft Integration Runtime 安装向导”页上,选择“下一步”。 In the Welcome to Microsoft Integration Runtime Setup wizard, select Next .

  3. 在“最终用户许可协议”窗口中接受许可协议的条款,然后选择“下一步” 。In the End-User License Agreement window, accept the terms and license agreement, and select Next .

  4. 在“目标文件夹”窗口中,选择“下一步”。In the Destination Folder window, select Next .

  5. 在“准备安装 Microsoft Integration Runtime”窗口中选择“安装”。 In the Ready to install Microsoft Integration Runtime window, select Install .

  6. 在“完成 Microsoft Integration Runtime 安装向导”页中选择“完成”。 In the Completed the Microsoft Integration Runtime Setup wizard, select Finish .

  7. 在“注册 Integration Runtime (自承载)”窗口中粘贴在上一部分保存的密钥,然后选择“注册”。 In the Register Integration Runtime (Self-hosted) window, paste the key you saved in the previous section, and then select Register .

    注册集成运行时

  8. 在“新建 Integration Runtime (自承载)节点”窗口中,选择“完成”。 In the New Integration Runtime (Self-hosted) Node window, select Finish .

    “新建 Integration Runtime 节点”窗口

  9. 成功注册自承载 Integration Runtime 后,会显示以下消息:When the self-hosted integration runtime is registered successfully, the following message is displayed:

    已成功注册

  10. 在“注册 Integration Runtime (自承载)”窗口中,选择“启动配置管理器”。 In the Register Integration Runtime (Self-hosted) window, select Launch Configuration Manager .

  11. 将节点连接到云服务后,会显示以下消息:When the node is connected to the cloud service, the following message is displayed:

    节点已连接

  12. 执行以下操作,测试到 SQL Server 数据库的连接:Test the connectivity to your SQL Server database by doing the following:

    a.a. 在“配置管理器”窗口中,切换到“诊断”选项卡。 In the Configuration Manager window, switch to the Diagnostics tab.

    b.b. 在“数据源类型”框中,选择“SqlServer”。 In the Data source type box, select SqlServer .

    c.c. 输入服务器名称。Enter the server name.

    d.d. 输入数据库名称。Enter the database name.

    e.e. 选择身份验证模式。Select the authentication mode.

    f.f. 输入用户名。Enter the username.

    g.g. 输入与用户名关联的密码。Enter the password that's associated with the username.

    h.h. 若要确认 Integration Runtime 能否连接到 SQL Server,请选择“测试”。To confirm that integration runtime can connect to the SQL Server, select Test .
    连接成功Connectivity succeeded

    如果连接成功,则会显示绿色复选图标。If the connection is successful, a green checkmark icon is displayed. 否则,会收到与故障相关的错误消息。Otherwise, you'll receive an error message associated with the failure. 请解决问题,确保 Integration Runtime 可以连接到 SQL Server 实例。Fix any issues, and ensure that the integration runtime can connect to your SQL Server instance.

    记下所有上述值,在本教程的后面部分使用。Note all the preceding values for later use in this tutorial.

创建链接服务Create linked services

若要将数据存储和计算服务链接到数据工厂,请在数据工厂中创建链接服务。To link your data stores and compute services to the data factory, create linked services in the data factory. 在本教程中,请将 Azure 存储帐户和 SQL Server 实例链接到数据存储。In this tutorial, you link your Azure storage account and SQL Server instance to the data store. 链接服务包含的连接信息可供数据工厂服务用来在运行时连接到它们。The linked services have the connection information that the Data Factory service uses at runtime to connect to them.

创建 Azure 存储链接服务(目标/接收器)Create an Azure Storage linked service (destination/sink)

在此步骤中,请将 Azure 存储帐户链接到数据工厂。In this step, you link your Azure storage account to the data factory.

  1. C:\ADFv2Tutorial 文件夹中,创建包含以下代码的名为 AzureStorageLinkedService.json 的 JSON 文件。Create a JSON file named AzureStorageLinkedService.json in the C:\ADFv2Tutorial folder with the following code. 如果 ADFv2Tutorial 文件夹不存在,请创建。If the ADFv2Tutorial folder does not already exist, create it.

    重要

    在保存文件之前,请将 <accountName> 和 <accountKey> 替换为存储帐户的名称和密钥。Before you save the file, replace <accountName> and <accountKey> with the name and key of your Azure storage account. 已在先决条件部分记下它们。You noted them in the Prerequisites section.

     {
         "name": "AzureStorageLinkedService",
         "properties": {
             "annotations": [],
             "type": "AzureBlobStorage",
             "typeProperties": {
                 "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>;EndpointSuffix=core.chinacloudapi.cn"
             }
         }
     }
    
  2. 在 PowerShell 中切换到 C:\ADFv2Tutorial 文件夹。In PowerShell, switch to the C:\ADFv2Tutorial folder.

    Set-Location 'C:\ADFv2Tutorial'    
    
  3. 若要创建链接服务 AzureStorageLinkedService,请运行以下 Set-AzDataFactoryV2LinkedService cmdlet:To create the linked service, AzureStorageLinkedService, run the following Set-AzDataFactoryV2LinkedService cmdlet:

    Set-AzDataFactoryV2LinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $ResourceGroupName -Name "AzureStorageLinkedService" -File ".\AzureStorageLinkedService.json"
    

    下面是示例输出:Here is a sample output:

    LinkedServiceName : AzureStorageLinkedService
    ResourceGroupName : <resourceGroup name>
    DataFactoryName   : <dataFactory name>
    Properties        : Microsoft.Azure.Management.DataFactory.Models.AzureBlobStorageLinkedService
    

    如果收到“找不到文件”错误,请运行 dir 命令,确认该文件是否存在。If you receive a "file not found" error, confirm that the file exists by running the dir command. 如果文件名的扩展为 .txt (例如 AzureStorageLinkedService.json.txt),请将其删除,然后再次运行 PowerShell 命令。If the file name has a .txt extension (for example, AzureStorageLinkedService.json.txt), remove it, and then run the PowerShell command again.

创建并加密 SQL Server 链接服务(源)Create and encrypt a SQL Server linked service (source)

在此步骤中,请将 SQL Server 实例链接到数据工厂。In this step, you link your SQL Server instance to the data factory.

  1. 使用以下代码,在 C:\ADFv2Tutorial 文件夹中创建名为 SqlServerLinkedService.json 的 JSON 文件:Create a JSON file named SqlServerLinkedService.json in the C:\ADFv2Tutorial folder by using the following code:

    重要

    根据连接到 SQL Server 时所使用的身份验证选择节。Select the section that's based on the authentication that you use to connect to SQL Server.

    使用 SQL 身份验证 (sa):Using SQL authentication (sa):

    {  
        "name":"SqlServerLinkedService",
        "type":"Microsoft.DataFactory/factories/linkedservices",
        "properties":{  
            "annotations":[  
    
            ],
            "type":"SqlServer",
            "typeProperties":{  
                "connectionString":"integrated security=False;data source=<serverName>;initial catalog=<databaseName>;user id=<userName>;password=<password>"
            },
            "connectVia":{  
                "referenceName":"<integration runtime name> ",
                "type":"IntegrationRuntimeReference"
            }
        }
    }
    

    使用 Windows 身份验证:Using Windows authentication:

    {  
        "name":"SqlServerLinkedService",
        "type":"Microsoft.DataFactory/factories/linkedservices",
        "properties":{  
            "annotations":[  
    
            ],
            "type":"SqlServer",
            "typeProperties":{  
                "connectionString":"integrated security=True;data source=<serverName>;initial catalog=<databaseName>",
                "userName":"<username> or <domain>\\<username>",
                "password":{  
                    "type":"SecureString",
                    "value":"<password>"
                }
            },
            "connectVia":{  
                "referenceName":"<integration runtime name>",
                "type":"IntegrationRuntimeReference"
            }
        }
    }
    

    重要

    • 根据连接到 SQL Server 实例时所使用的身份验证选择节。Select the section that's based on the authentication you use to connect to your SQL Server instance.
    • 将 <integration runtime name> 替换为集成运行时的名称。Replace <integration runtime name> with the name of your integration runtime.
    • 保存文件之前,请将 <servername>、<databasename>、<username> 和 <password> 替换为 SQL Server 实例的值 。Before you save the file, replace <servername> , <databasename> , <username> , and <password> with the values of your SQL Server instance.
    • 如需在用户帐户或服务器名称中使用反斜杠 (\),请在其前面加上转义字符 (\)。If you need to use a backslash (\) in your user account or server name, precede it with the escape character (\). 例如,使用 mydomain\\myuserFor example, use mydomain\\myuser .
  2. 若要加密敏感数据(用户名、密码等),请运行 New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet。To encrypt the sensitive data (username, password, and so on), run the New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet.
    这种加密可确保使用数据保护应用程序编程接口 (DPAPI) 加密凭据。This encryption ensures that the credentials are encrypted using Data Protection Application Programming Interface (DPAPI). 加密的凭据存储在自承载 Integration Runtime 节点本地(本地计算机)。The encrypted credentials are stored locally on the self-hosted integration runtime node (local machine). 可将输出的有效负载重定向到包含已加密凭据的另一个 JSON 文件(在本例中为 encryptedLinkedService.json )。The output payload can be redirected to another JSON file (in this case, encryptedLinkedService.json ) that contains encrypted credentials.

    New-AzDataFactoryV2LinkedServiceEncryptedCredential -DataFactoryName $dataFactoryName -ResourceGroupName $ResourceGroupName -IntegrationRuntimeName $integrationRuntimeName -File ".\SQLServerLinkedService.json" > encryptedSQLServerLinkedService.json
    
  3. 运行以下命令,创建 EncryptedSqlServerLinkedService:Run the following command, which creates EncryptedSqlServerLinkedService:

    Set-AzDataFactoryV2LinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $ResourceGroupName -Name "EncryptedSqlServerLinkedService" -File ".\encryptedSqlServerLinkedService.json"
    

创建数据集Create datasets

在此步骤中,请创建输入和输出数据集。In this step, you create input and output datasets. 这些数据集代表复制操作的输入和输出数据,该复制操作可将数据从 SQL Server 数据库复制到 Azure Blob 存储。They represent input and output data for the copy operation, which copies data from the SQL Server database to Azure Blob storage.

为源 SQL Server 数据库创建数据集Create a dataset for the source SQL Server database

在此步骤中,请定义一个数据集,代表 SQL Server 数据库实例中的数据。In this step, you define a dataset that represents data in the SQL Server database instance. 数据集为 SqlServerTable 类型。The dataset is of type SqlServerTable. 它引用在上一步创建的 SQL Server 链接服务。It refers to the SQL Server linked service that you created in the preceding step. 链接服务包含的连接信息可供数据工厂服务用来在运行时连接到 SQL Server 实例。The linked service has the connection information that the Data Factory service uses to connect to your SQL Server instance at runtime. 此数据集指定数据库中包含相应数据的 SQL 表。This dataset specifies the SQL table in the database that contains the data. 在本教程中, emp 表包含源数据。In this tutorial, the emp table contains the source data.

  1. C:\ADFv2Tutorial 文件夹中,创建包含以下代码的名为 SqlServerDataset.json 的 JSON 文件:Create a JSON file named SqlServerDataset.json in the C:\ADFv2Tutorial folder, with the following code:

    {  
        "name":"SqlServerDataset",
        "properties":{  
            "linkedServiceName":{  
                "referenceName":"EncryptedSqlServerLinkedService",
                "type":"LinkedServiceReference"
            },
            "annotations":[  
    
            ],
            "type":"SqlServerTable",
            "schema":[  
    
            ],
            "typeProperties":{  
                "schema":"dbo",
                "table":"emp"
            }
        }
    }
    
  2. 若要创建数据集 SqlServerDataset,请运行 Set-AzDataFactoryV2Dataset cmdlet。To create the dataset SqlServerDataset, run the Set-AzDataFactoryV2Dataset cmdlet.

    Set-AzDataFactoryV2Dataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "SqlServerDataset" -File ".\SqlServerDataset.json"
    

    下面是示例输出:Here is the sample output:

    DatasetName       : SqlServerDataset
    ResourceGroupName : <resourceGroupName>
    DataFactoryName   : <dataFactoryName>
    Structure         :
    Properties        : Microsoft.Azure.Management.DataFactory.Models.SqlServerTableDataset
    

为 Azure Blob 存储(接收器)创建数据集Create a dataset for Azure Blob storage (sink)

在此步骤中,请定义一个数据集,表示要复制到 Azure Blob 存储的数据。In this step, you define a dataset that represents data that will be copied to Azure Blob storage. 数据集为 AzureBlob 类型。The dataset is of the type AzureBlob. 它引用此前在本教程中创建的 Azure 存储链接服务。It refers to the Azure Storage linked service that you created earlier in this tutorial.

链接服务包含的连接信息可供数据工厂用来在运行时连接到 Azure 存储帐户。The linked service has the connection information that the data factory uses at runtime to connect to your Azure storage account. 此数据集在 Azure 存储中指定从 SQL Server 数据库向其复制数据的文件夹。This dataset specifies the folder in the Azure storage to which the data is copied from the SQL Server database. 在本教程中,该文件夹为 adftutorial/fromonprem ,其中 adftutorial 是 Blob 容器,fromonprem 是文件夹。In this tutorial, the folder is adftutorial/fromonprem , where adftutorial is the blob container and fromonprem is the folder.

  1. C:\ADFv2Tutorial 文件夹中,创建包含以下代码的名为 AzureBlobDataset.json 的 JSON 文件:Create a JSON file named AzureBlobDataset.json in the C:\ADFv2Tutorial folder, with the following code:

    {  
        "name":"AzureBlobDataset",
        "properties":{  
            "linkedServiceName":{  
                "referenceName":"AzureStorageLinkedService",
                "type":"LinkedServiceReference"
            },
            "annotations":[  
    
            ],
            "type":"DelimitedText",
            "typeProperties":{  
                "location":{  
                    "type":"AzureBlobStorageLocation",
                    "folderPath":"fromonprem",
                    "container":"adftutorial"
                },
                "columnDelimiter":",",
                "escapeChar":"\\",
                "quoteChar":"\""
            },
            "schema":[  
    
            ]
        },
        "type":"Microsoft.DataFactory/factories/datasets"
    }
    
  2. 若要创建数据集 AzureBlobDataset,请运行 Set-AzDataFactoryV2Dataset cmdlet。To create the dataset AzureBlobDataset, run the Set-AzDataFactoryV2Dataset cmdlet.

    Set-AzDataFactoryV2Dataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "AzureBlobDataset" -File ".\AzureBlobDataset.json"
    

    下面是示例输出:Here is the sample output:

    DatasetName       : AzureBlobDataset
    ResourceGroupName : <resourceGroupName>
    DataFactoryName   : <dataFactoryName>
    Structure         :
    Properties        : Microsoft.Azure.Management.DataFactory.Models.DelimitedTextDataset
    

创建管道Create a pipeline

本教程创建包含复制活动的管道。In this tutorial, you create a pipeline with a copy activity. 该复制活动使用 SqlServerDataset 作为输入数据集,AzureBlobDataset 作为输出数据集。The copy activity uses SqlServerDataset as the input dataset and AzureBlobDataset as the output dataset. 源类型设置为 SqlSource ,接收器类型设置为 BlobSinkThe source type is set to SqlSource and the sink type is set to BlobSink .

  1. C:\ADFv2Tutorial 文件夹中,创建包含以下代码的名为 SqlServerToBlobPipeline.json 的 JSON 文件:Create a JSON file named SqlServerToBlobPipeline.json in the C:\ADFv2Tutorial folder, with the following code:

    {  
        "name":"SqlServerToBlobPipeline",
        "properties":{  
            "activities":[  
                {  
                    "name":"CopySqlServerToAzureBlobActivity",
                    "type":"Copy",
                    "dependsOn":[  
    
                    ],
                    "policy":{  
                        "timeout":"7.00:00:00",
                        "retry":0,
                        "retryIntervalInSeconds":30,
                        "secureOutput":false,
                        "secureInput":false
                    },
                    "userProperties":[  
    
                    ],
                    "typeProperties":{  
                        "source":{  
                            "type":"SqlServerSource"
                        },
                        "sink":{  
                            "type":"DelimitedTextSink",
                            "storeSettings":{  
                                "type":"AzureBlobStorageWriteSettings"
                            },
                            "formatSettings":{  
                                "type":"DelimitedTextWriteSettings",
                                "quoteAllText":true,
                                "fileExtension":".txt"
                            }
                        },
                        "enableStaging":false
                    },
                    "inputs":[  
                        {  
                            "referenceName":"SqlServerDataset",
                            "type":"DatasetReference"
                        }
                    ],
                    "outputs":[  
                        {  
                            "referenceName":"AzureBlobDataset",
                            "type":"DatasetReference"
                        }
                    ]
                }
            ],
            "annotations":[  
    
            ]
        }
    }
    
  2. 若要创建管道 SQLServerToBlobPipeline,请运行 Set-AzDataFactoryV2Pipeline cmdlet。To create the pipeline SQLServerToBlobPipeline, run the Set-AzDataFactoryV2Pipeline cmdlet.

    Set-AzDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "SQLServerToBlobPipeline" -File ".\SQLServerToBlobPipeline.json"
    

    下面是示例输出:Here is the sample output:

    PipelineName      : SQLServerToBlobPipeline
    ResourceGroupName : <resourceGroupName>
    DataFactoryName   : <dataFactoryName>
    Activities        : {CopySqlServerToAzureBlobActivity}
    Parameters        :  
    

创建管道运行Create a pipeline run

针对 SQLServerToBlobPipeline 管道启动管道运行,并捕获管道运行 ID,以便将来进行监视。Start a pipeline run for the SQLServerToBlobPipeline pipeline, and capture the pipeline run ID for future monitoring.

$runId = Invoke-AzDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineName 'SQLServerToBlobPipeline'

监视管道运行Monitor the pipeline run

  1. 若要持续检查管道 SQLServerToBlobPipeline 的运行状态,请在 PowerShell 中运行以下脚本,列显最终结果:To continuously check the run status of pipeline SQLServerToBlobPipeline, run the following script in PowerShell, and print the final result:

    while ($True) {
        $result = Get-AzDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)
    
        if (($result | Where-Object { $_.Status -eq "InProgress" } | Measure-Object).count -ne 0) {
            Write-Host "Pipeline run status: In Progress" -foregroundcolor "Yellow"
            Start-Sleep -Seconds 30
        }
        else {
            Write-Host "Pipeline 'SQLServerToBlobPipeline' run finished. Result:" -foregroundcolor "Yellow"
            $result
            break
        }
    }
    

    下面是示例运行的输出:Here is the output of the sample run:

    ResourceGroupName    : <resourceGroupName>
    DataFactoryName      : <dataFactoryName>
    ActivityRunId        : 24af7cf6-efca-4a95-931d-067c5c921c25
    ActivityName         : CopySqlServerToAzureBlobActivity
    ActivityType         : Copy
    PipelineRunId        : 7b538846-fd4e-409c-99ef-2475329f5729
    PipelineName         : SQLServerToBlobPipeline
    Input                : {source, sink, enableStaging}
    Output               : {dataRead, dataWritten, filesWritten, sourcePeakConnections...}
    LinkedServiceName    :
    ActivityRunStart     : 9/11/2019 7:10:37 AM
    ActivityRunEnd       : 9/11/2019 7:10:58 AM
    DurationInMs         : 21094
    Status               : Succeeded
    Error                : {errorCode, message, failureType, target}
    AdditionalProperties : {[retryAttempt, ], [iterationHash, ], [userProperties, {}], [recoveryStatus, None]...}
    
  2. 可以运行以下命令,获取管道 SQLServerToBlobPipeline 的运行 ID,并查看详细的活动运行结果:You can get the run ID of pipeline SQLServerToBlobPipeline and check the detailed activity run result by running the following command:

    Write-Host "Pipeline 'SQLServerToBlobPipeline' run result:" -foregroundcolor "Yellow"
    ($result | Where-Object {$_.ActivityName -eq "CopySqlServerToAzureBlobActivity"}).Output.ToString()
    

    下面是示例运行的输出:Here is the output of the sample run:

    {  
        "dataRead":36,
        "dataWritten":32,
        "filesWritten":1,
        "sourcePeakConnections":1,
        "sinkPeakConnections":1,
        "rowsRead":2,
        "rowsCopied":2,
        "copyDuration":18,
        "throughput":0.01,
        "errors":[  
    
        ],
        "effectiveIntegrationRuntime":"ADFTutorialIR",
        "usedParallelCopies":1,
        "executionDetails":[  
            {  
                "source":{  
                    "type":"SqlServer"
                },
                "sink":{  
                    "type":"AzureBlobStorage",
                    "region":"ChinaEast2"
                },
                "status":"Succeeded",
                "start":"2019-09-11T07:10:38.2342905Z",
                "duration":18,
                "usedParallelCopies":1,
                "detailedDurations":{  
                    "queuingDuration":6,
                    "timeToFirstByte":0,
                    "transferDuration":5
                }
            }
        ]
    }
    

验证输出Verify the output

该管道自动在 adftutorial Blob 容器中创建名为 fromonprem 的输出文件夹。The pipeline automatically creates the output folder named fromonprem in the adftutorial blob container. 确认在输出文件夹中看到 dbo.emp.txt 文件。Confirm that you see the dbo.emp.txt file in the output folder.

  1. 在 Azure 门户的“adftutorial”容器窗口中选择“刷新”,查看输出文件夹。 In the Azure portal, in the adftutorial container window, select Refresh to see the output folder.

  2. 选择文件夹列表中的 fromonpremSelect fromonprem in the list of folders.

  3. 确认可以看到名为 dbo.emp.txt 的文件。Confirm that you see a file named dbo.emp.txt.

    输出文件

后续步骤Next steps

此示例中的管道将数据从 Azure Blob 存储中的一个位置复制到另一个位置。The pipeline in this sample copies data from one location to another in Azure Blob storage. 你已了解如何执行以下操作:You learned how to:

  • 创建数据工厂。Create a data factory.
  • 创建自我托管的集成运行时。Create a self-hosted integration runtime.
  • 创建 SQL Server 和 Azure 存储链接服务。Create SQL Server and Azure Storage linked services.
  • 创建 SQL Server 和 Azure Blob 数据集。Create SQL Server and Azure Blob datasets.
  • 创建包含复制活动的管道,用于移动数据。Create a pipeline with a copy activity to move the data.
  • 启动管道运行。Start a pipeline run.
  • 监视管道运行。Monitor the pipeline run.

有关数据工厂支持的数据存储的列表,请参阅支持的数据存储For a list of data stores that are supported by Data Factory, see supported data stores.

若要了解如何将数据从源批量复制到目标,请继续学习以下教程:To learn about copying data in bulk from a source to a destination, advance to the following tutorial: