使用 PowerShell 创建用于将数据从本地复制到 Azure 的数据工厂管道Use PowerShell to create a data factory pipeline to copy data from on-premises to Azure

本 PowerShell 脚本示例在 Azure 数据工厂中创建管道,该管道将数据从本地 SQL Server 数据库复制到 Azure Blob 存储。This sample PowerShell script creates a pipeline in Azure Data Factory that copies data from an on-premises SQL Server database to an Azure Blob Storage.

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

本示例需要 Azure PowerShell。This sample requires Azure PowerShell. 运行 Get-Module -ListAvailable Az 即可查找版本。Run Get-Module -ListAvailable Az to find the version. 如果需要进行安装或升级,请参阅安装 Azure PowerShell 模块If you need to install or upgrade, see Install Azure PowerShell module.

运行 Connect-AzAccount -Environment AzureChinaCloud,创建与 Azure 的连接。Run Connect-AzAccount -Environment AzureChinaCloud to create a connection with Azure.

先决条件Prerequisites

  • SQL ServerSQL Server. 在本示例中,将本地 SQL Server 数据库用作源数据存储。 You use an on-premises SQL Server database as a source data store in this sample.
  • Azure 存储帐户Azure Storage account. 本示例使用 Azure Blob 存储作为“目标/接收器”数据存储。 You use Azure blob storage as a destination/sink data store in this sample. 如果没有 Azure 存储帐户,请参阅创建存储帐户一文获取创建步骤。if you don't have an Azure storage account, see the Create a storage account article for steps to create one.
  • 自承载集成运行时。 Self-hosted integration runtime. 下载中心下载并运行 MSI 文件,在计算机上安装自承载集成运行时。Download MSI file from the download center and run it to install a self-hosted integration runtime on your machine.

在 SQL Server 中创建示例数据库Create sample database in SQL Server

  1. 在本地 SQL Server 数据库中,使用以下 SQL 脚本创建名为“emp”的表: In the on-premises SQL Server database, create a table named emp by using the following SQL script:

      CREATE TABLE dbo.emp
      (
          ID int IDENTITY(1,1) NOT NULL,
          FirstName varchar(50),
          LastName varchar(50),
          CONSTRAINT PK_emp PRIMARY KEY (ID)
      )
      GO
    
  2. 在该表中插入一些示例数据:Insert some sample data into the table:

      INSERT INTO emp VALUES ('John', 'Doe')
      INSERT INTO emp VALUES ('Jane', 'Doe')
    

示例脚本Sample script

Important

此脚本在硬盘驱动器上的 c:\ 文件夹中创建 JSON 文件,用于定义数据工厂实体(链接服务、数据集和管道)。This script creates JSON files that define Data Factory entities (linked service, dataset, and pipeline) on your hard drive in the c:\ folder.

$resourceGroupName = "<Resource group name>"
$dataFactoryName = "<Data factory name>" # must be globally unique
$storageAccountName = "<Az.Storage account name>"
$storageAccountKey = "<Az.Storage account key>"
$sqlServerName = "<SQL server name>"
$sqlDatabaseName = "SQL Server database name"
$sqlTableName = "emp" # create the emp table if it does not already exist in your database with ID, FirstName, and LastName columns of type String. 
$sqlUserName = "<SQL Authentication - user name>"
$sqlPassword = "<SQL Authentication - user password>"
$blobFolderPath = "<Azure blob container name>/<Azure blob folder name>"
$integrationRuntimeName = "<Self-hosted integration runtime name"
$pipelineName = "SqlServerToBlobPipeline"
$dataFactoryRegion = "China East 2"

# Create a resource group
New-AzResourceGroup -Name $resourceGroupName -Location $dataFactoryRegion

# create a data factory
$df = Set-AzDataFactory -ResourceGroupName $resourceGroupName -Name $dataFactoryName -Location $dataFactoryRegion

# create a self-hosted integration runtime
Set-AzDataFactoryIntegrationRuntime -Name $integrationRuntimeName -Type SelfHosted -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName

# get the authorization key from the created integration runtime in the cloud
Get-AzDataFactoryIntegrationRuntimeKey -Name $integrationRuntimeName -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName | ConvertTo-Json

# IMPORTANT: Install self-hosted integration runtime on your machine and use one of the keys to register the IR installed on your machine with the cloud service

# create an Az.Storage linked service

## JSON definition of the linked service. 
$storageLinkedServiceDefinition = @"
{
    "name": "AzureStorageLinkedService",
    "properties": {
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": {
                "value": "DefaultEndpointsProtocol=https;AccountName=$storageAccountName;AccountKey=$storageAccountKey;EndpointSuffix=core.chinacloudapi.cn",
                "type": "SecureString"
            }
        }
    }
}
"@

## IMPORTANT: stores the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command. 
$storageLinkedServiceDefinition | Out-File c:\AzureStorageLinkedService.json

## Creates a linked service in the data factory
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "AzureStorageLinkedService" -File c:\AzureStorageLinkedService.json

# create an on-premises SQL Server linked service

## JSON definition of the linked service. 
$sqlServerLinkedServiceDefinition = @"
{
   "properties": {
     "type": "SqlServer",
     "typeProperties": {
         "connectionString": {
             "type": "SecureString",
            "value": "Server=$sqlServerName;Database=$sqlDatabaseName;User ID=$sqlUserName;Password=$sqlPassword;Timeout=60"
         }
     },
     "connectVia": {
       "type": "integrationRuntimeReference",
       "referenceName": "$integrationRuntimeName"
     }
 },
 "name": "SqlServerLinkedService"
}
"@

## IMPORTANT: stores the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command. 
$sqlServerLinkedServiceDefinition | Out-File c:\SqlServerLinkedService.json

## Encrypt SQL Server credentials 
New-AzDataFactoryLinkedServiceEncryptCredential -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -IntegrationRuntimeName $integrationRuntimeName -File "c:\SqlServerLinkedService.json" > c:\EncryptedSqlServerLinkedService.json

# Create a SQL Server linked service
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName "EncryptedSqlServerLinkedService" -File "c:\EncryptedSqlServerLinkedService.json"


# Create a source dataset for source SQL Server Database

## JSON definition of the dataset
$sourceSqlServerDatasetDefiniton = @"
{
   "properties": {
        "type": "SqlServerTable",
        "typeProperties": {
            "tableName": "$sqlTableName"
        },
        "structure": [
             {
                "name": "ID",
                "type": "String"
            },
            {
                "name": "FirstName",
                "type": "String"
            },
            {
                "name": "LastName",
                "type": "String"
            }
        ],
        "linkedServiceName": {
            "referenceName": "EncryptedSqlServerLinkedService",
            "type": "LinkedServiceReference"
        }
    },
    "name": "SqlServerDataset"
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryDataset command. 
$sourceSqlServerDatasetDefiniton | Out-File c:\SqlServerDataset.json

# Create an Azure Blob dataset in the data factory
Set-AzDataFactoryDataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "SqlServerDataset" -File "c:\SqlServerDataset.json"

# Create a dataset for sink Azure Blob Storage

## JSON definition of the dataset
$sinkBlobDatasetDefiniton = @"
{
    "properties": {
        "type": "AzureBlob",
        "typeProperties": {
            "folderPath": "$blobFolderPath",
            "format": {
                "type": "TextFormat"
            }
        },
        "linkedServiceName": {
            "referenceName": "AzureStorageLinkedService",
            "type": "LinkedServiceReference"
        }
    },
    "name": "AzureBlobDataset"
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryDataset command. 
$sinkBlobDatasetDefiniton | Out-File c:\AzureBlobDataset.json

## Create the Azure Blob dataset
Set-AzDataFactoryDataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "AzureBlobDataset" -File "c:\AzureBlobDataset.json"


# Create a pipeline in the data factory

## JSON definition of the pipeline
$pipelineDefinition = @"
{
   "name": "$pipelineName",
    "properties": {
        "activities": [       
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "SqlSource"
                    },
                    "sink": {
                        "type":"BlobSink"
                    }
                },
                "name": "CopySqlServerToAzureBlobActivity",
                "inputs": [
                    {
                        "referenceName": "SqlServerDataset",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "AzureBlobDataset",
                        "type": "DatasetReference"
                    }
                ]
            }
        ]
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryPipeline command. 
$pipelineDefinition | Out-File c:\SqlServerToBlobPipeline.json

## Create a pipeline in the data factory
Set-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name "$pipelineName" -File "c:\SqlServerToBlobPipeline.json"


# start the pipeline run
$runId = Invoke-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineName $pipelineName

# Check the pipeline run status until it finishes the copy operation
while ($True) {
    $result = Get-AzDataFactoryActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)

    if (($result | Where-Object { $_.Status -eq "InProgress" } | Measure-Object).count -ne 0) {
        Write-Host "Pipeline run status: In Progress" -foregroundcolor "Yellow"
        Start-Sleep -Seconds 30
    }
    else {
        Write-Host "Pipeline $pipelineName run finished. Result:" -foregroundcolor "Yellow"
        $result
        break
    }
}

# Get the activity run details 
    $result = Get-AzDataFactoryActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName `
        -PipelineRunId $runId `
        -RunStartedAfter (Get-Date).AddMinutes(-10) `
        -RunStartedBefore (Get-Date).AddMinutes(10) `
        -ErrorAction Stop

    $result

    if ($result.Status -eq "Succeeded") {`
        $result.Output -join "`r`n"`
    }`
    else {`
        $result.Error -join "`r`n"`
    }

# To remove the data factory from the resource gorup
# Remove-AzDataFactory -Name $dataFactoryName -ResourceGroupName $resourceGroupName
# 
# To remove the whole resource group
# Remove-AzResourceGroup  -Name $resourceGroupName

清理部署Clean up deployment

运行示例脚本后,可以使用以下命令删除资源组以及与其关联的所有资源:After you run the sample script, you can use the following command to remove the resource group and all resources associated with it:

Remove-AzResourceGroup -ResourceGroupName $resourceGroupName

若要从资源组中删除数据工厂,请运行以下命令:To remove the data factory from the resource group, run the following command:

Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName

脚本说明Script explanation

此脚本使用以下命令:This script uses the following commands:

命令Command 注释Notes
New-AzResourceGroupNew-AzResourceGroup 创建用于存储所有资源的资源组。Creates a resource group in which all resources are stored.
Set-AzDataFactoryV2Set-AzDataFactoryV2 创建数据工厂。Create a data factory.
New-AzDataFactoryV2LinkedServiceEncryptCredentialNew-AzDataFactoryV2LinkedServiceEncryptCredential 在链接的服务中对凭据进行加密,并使用加密凭据生成新的链接服务定义。Encrypts credentials in a linked service and generates a new linked service definition with the encrypted credential.
Set-AzDataFactoryV2LinkedServiceSet-AzDataFactoryV2LinkedService 在数据工厂中创建链接服务。Creates a linked service in the data factory. 链接服务可将数据存储或计算链接到数据工厂。A linked service links a data store or compute to a data factory.
Set-AzDataFactoryV2DatasetSet-AzDataFactoryV2Dataset 在数据工厂中创建数据集。Creates a dataset in the data factory. 数据集表示管道中活动的输入/输出。A dataset represents input/output for an activity in a pipeline.
Set-AzDataFactoryV2PipelineSet-AzDataFactoryV2Pipeline 在数据工厂中创建管道。Creates a pipeline in the data factory. 一个管道包含一个或多个执行某项操作的活动。A pipeline contains one or more activities that performs a certain operation. 在此管道中,复制活动在 Azure Blob 存储中将数据从一个位置复制到另一个位置。In this pipeline, a copy activity copies data from one location to another location in an Azure Blob Storage.
Invoke-AzDataFactoryV2PipelineInvoke-AzDataFactoryV2Pipeline 为管道创建运行。Creates a run for the pipeline. 换而言之,就是运行管道。In other words, runs the pipeline.
Get-AzDataFactoryV2ActivityRunGet-AzDataFactoryV2ActivityRun 获取管道中活动的运行(活动运行)的相关详细信息。Gets details about the run of the activity (activity run) in the pipeline.
Remove-AzResourceGroupRemove-AzResourceGroup 删除资源组,包括所有嵌套的资源。Deletes a resource group including all nested resources.

后续步骤Next steps

有关 Azure PowerShell 的详细信息,请参阅 Azure PowerShell 文档For more information on the Azure PowerShell, see Azure PowerShell documentation.

可以在 Azure 数据工厂 PowerShell 示例中找到其他 Azure 数据工厂 PowerShell 脚本示例。Additional Azure Data Factory PowerShell script samples can be found in the Azure Data Factory PowerShell samples.