PowerShell script - copy multiple tables in bulk by using Azure Data Factory
This sample PowerShell script copies data from multiple tables in Azure SQL Database to Azure Synapse Analytics.
Note
We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.
This sample requires Azure PowerShell. Run Get-Module -ListAvailable Az
to find the version.
If you need to install or upgrade, see Install Azure PowerShell module.
Run the Connect-AzAccount -Environment AzureChinaCloud cmdlet to connect to Azure operated by 21Vianet.
See tutorial: bulk copy for the prerequisites for running this sample.
Sample script
Important
This script creates JSON files that define Data Factory entities (linked service, dataset, and pipeline) on your hard drive in the c:\ folder.
# Set variables with your own values
$resourceGroupName = "<resource group name>"
$dataFactoryName = "<data factory name>" # Name of the data factory must be globally unique
$dataFactoryNameLocation = "China East 2"
$azureSqlServer = "<azure sql server name>"
$azureSqlServerUser = "<azure sql server user>"
$azureSqlServerUserPassword = "<azure sql server password>"
$azureSqlDatabase = "<source azure sql database name>"
$azureSqlDataWarehouse = "<sink azure sql data warehouse name>"
$azureStorageAccount = "<Az.Storage accoutn name>"
$azureStorageAccountKey = "<Az.Storage account key>"
# No need to change values for these variables
$azureSqlDatabaseLinkedService = "AzureSqlDatabaseLinkedService"
$azureSqlDataWarehouseLinkedService = "AzureSqlDataWarehouseLinkedService"
$azureStorageLinkedService = "AzureStorageLinkedService"
$azureSqlDatabaseDataset = "AzureSqlDatabaseDataset"
$azureSqlDataWarehouseDataset = "AzureSqlDataWarehouseDataset"
$IterateAndCopySQLTablesPipeline = "IterateAndCopySQLTables"
$pipelineGetTableListAndTriggerCopyData = "GetTableListAndTriggerCopyData"
# create a resource gorup
New-AzResourceGroup -Name $resourceGroupName -Location $dataFactoryNameLocation
# create a data factory
$df = Set-AzDataFactory -ResourceGroupName $resourceGroupName -Location $dataFactoryNameLocation -Name $dataFactoryName
# create a linked service for Azure SQL Database (source)
$azureSQLDatabaseLinkedServiceDefinition = @"
{
"name": "$azureSqlDatabaseLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:$azureSqlServer.database.chinacloudapi.cn,1433;Database=$azureSqlDatabase;User ID=$azureSqlServerUser;Password=$azureSqlServerUserPassword;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command.
$azureSQLDatabaseLinkedServiceDefinition | Out-File c:\$azureSqlDatabaseLinkedService.json
## Creates an Az.Storage linked service
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDatabaseLinkedService -File c:\$azureSqlDatabaseLinkedService.json
# create a linked service for Azure SQL Datawarehouse (sink)
$azureSQLDataWarehouseLinkedServiceDefinition = @"
{
"name": "$azureSqlDataWarehouseLinkedService",
"properties": {
"type": "AzureSqlDW",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:$azureSqlServer.database.chinacloudapi.cn,1433;Database=$azureSqlDataWarehouse;User ID=$azureSqlServerUser;Password=$azureSqlServerUserPassword;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command.
$azureSQLDataWarehouseLinkedServiceDefinition | Out-File c:\$azureSqlDataWarehouseLinkedService.json
## Creates an linked service for Az.Storage Account. Interim storage to enable PolyBase
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDataWarehouseLinkedService -File c:\$azureSqlDataWarehouseLinkedService.json
$storageLinkedServiceDefinition = @"
{
"name": "$azureStorageLinkedService",
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": {
"value": "DefaultEndpointsProtocol=https;AccountName=$azureStorageAccount;AccountKey=$azureStorageAccountKey;EndpointSuffix=core.chinacloudapi.cn",
"type": "SecureString"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command.
$storageLinkedServiceDefinition | Out-File c:\$azureStorageLinkedService.json
## Creates an Az.Storage linked service
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureStorageLinkedService -File c:\$azureStorageLinkedService.json
# create the input dataset (Azure SQL Database)
$azureSqlDatabaseDatasetDefiniton = @"
{
"name": "$azureSqlDatabaseDataset",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": {
"referenceName": "$azureSqlDatabaseLinkedService",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "dummy"
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryDataset command.
$azureSqlDatabaseDatasetDefiniton | Out-File c:\$azureSqlDatabaseDataset.json
## Create a dataset in the data factory
Set-AzDataFactoryDataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDatabaseDataset -File "c:\$azureSqlDatabaseDataset.json"
# create the output dataset (Azure SQL Data Warehouse)
$azureSqlDataWarehouseDatasetDefiniton = @"
{
"name": "$azureSqlDataWarehouseDataset",
"properties": {
"type": "AzureSqlDWTable",
"linkedServiceName": {
"referenceName": "$azureSqlDataWarehouseLinkedService",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": {
"value": "@{dataset().DWTableName}",
"type": "Expression"
}
},
"parameters":{
"DWTableName":{
"type":"String"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryDataset command.
$azureSqlDataWarehouseDatasetDefiniton | Out-File c:\$azureSqlDataWarehouseDataset.json
## Create a dataset in the data factory
Set-AzDataFactoryDataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDataWarehouseDataset -File "c:\$azureSqlDataWarehouseDataset.json"
# Create a pipeline in the data factory that copies data from source SQL Database to sink SQL Data Warehouse
$pipelineDefinition = @"
{
"name": "$IterateAndCopySQLTablesPipeline",
"properties": {
"activities": [
{
"name": "IterateSQLTables",
"type": "ForEach",
"typeProperties": {
"isSequential": "false",
"items": {
"value": "@pipeline().parameters.tableList",
"type": "Expression"
},
"activities": [
{
"name": "CopyData",
"description": "Copy data from SQL database to SQL DW",
"type": "Copy",
"inputs": [
{
"referenceName": "$azureSqlDatabaseDataset",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "$azureSqlDataWarehouseDataset",
"type": "DatasetReference",
"parameters": {
"DWTableName": "[@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]"
}
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]"
},
"sink": {
"type": "SqlDWSink",
"preCopyScript": "TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]",
"allowPolyBase": true
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": {
"referenceName": "$azureStorageLinkedService",
"type": "LinkedServiceReference"
}
}
}
}
]
}
}
],
"parameters": {
"tableList": {
"type": "Object"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryPipeline command.
$pipelineDefinition | Out-File c:\$IterateAndCopySQLTablesPipeline.json
## Create a pipeline in the data factory
Set-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $IterateAndCopySQLTablesPipeline -File "c:\$IterateAndCopySQLTablesPipeline.json"
# Create a pipeline in the data factory that retrieves a list of tables and invokes the above pipeline for each table to be copied
$pipeline2Definition = @"
{
"name":"$pipelineGetTableListAndTriggerCopyData",
"properties":{
"activities":[
{
"name": "LookupTableList",
"description": "Retrieve the table list from Azure SQL dataabse",
"type": "Lookup",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_SCHEMA = 'SalesLT' and TABLE_NAME <> 'ProductModel'"
},
"dataset": {
"referenceName": "$azureSqlDatabaseDataset",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "TriggerCopy",
"type": "ExecutePipeline",
"typeProperties": {
"parameters": {
"tableList": {
"value": "@activity('LookupTableList').output.value",
"type": "Expression"
}
},
"pipeline": {
"referenceName": "$IterateAndCopySQLTablesPipeline",
"type": "PipelineReference"
},
"waitOnCompletion": true
},
"dependsOn": [
{
"activity": "LookupTableList",
"dependencyConditions": [
"Succeeded"
]
}
]
}
]
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryPipeline command.
$pipeline2Definition | Out-File c:\$pipelineGetTableListAndTriggerCopyData.json
## Create a pipeline in the data factory
Set-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $pipelineGetTableListAndTriggerCopyData -File "c:\$pipelineGetTableListAndTriggerCopyData.json"
# Create a pipeline run
## JSON definition for dummy pipeline parameters
$pipelineParameters = @"
{
"dummy": "b"
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Invoke-AzDataFactoryPipeline command.
$pipelineParameters | Out-File c:\PipelineParameters.json
# Create a pipeline run by using parameters
$runId = Invoke-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineName $pipelineGetTableListAndTriggerCopyData -ParameterFile c:\PipelineParameters.json
# Check the pipeline run status until it finishes the copy operation
Start-Sleep -Seconds 30
while ($True) {
$result = Get-AzDataFactoryActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)
if (($result | Where-Object { $_.Status -eq "InProgress" } | Measure-Object).count -ne 0) {
Write-Host "Pipeline run status: In Progress" -foregroundcolor "Yellow"
Start-Sleep -Seconds 30
}
else {
Write-Host "Pipeline '$pipelineGetTableListAndTriggerCopyData' run finished. Result:" -foregroundcolor "Yellow"
$result
break
}
}
# Get the activity run details
$result = Get-AzDataFactoryActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName `
-PipelineRunId $runId `
-RunStartedAfter (Get-Date).AddMinutes(-10) `
-RunStartedBefore (Get-Date).AddMinutes(10) `
-ErrorAction Stop
$result
if ($result.Status -eq "Succeeded") {`
$result.Output -join "`r`n"`
}`
else {`
$result.Error -join "`r`n"`
}
# To remove the data factory from the resource gorup
# Remove-AzDataFactory -Name $dataFactoryName -ResourceGroupName $resourceGroupName
#
# To remove the whole resource group
# Remove-AzResourceGroup -Name $resourceGroupName
Clean up deployment
After you run the sample script, you can use the following command to remove the resource group and all resources associated with it:
Remove-AzResourceGroup -ResourceGroupName $resourceGroupName
To remove the data factory from the resource group, run the following command:
Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName
Script explanation
This script uses the following commands:
Command | Notes |
---|---|
New-AzResourceGroup | Creates a resource group in which all resources are stored. |
Set-AzDataFactoryV2 | Create a data factory. |
Set-AzDataFactoryV2LinkedService | Creates a linked service in the data factory. A linked service links a data store or compute to a data factory. |
Set-AzDataFactoryV2Dataset | Creates a dataset in the data factory. A dataset represents input/output for an activity in a pipeline. |
Set-AzDataFactoryV2Pipeline | Creates a pipeline in the data factory. A pipeline contains one or more activities that performs a certain operation. In this pipeline, a copy activity copies data from one location to another location in an Azure Blob Storage. |
Invoke-AzDataFactoryV2Pipeline | Creates a run for the pipeline. In other words, runs the pipeline. |
Get-AzDataFactoryV2ActivityRun | Gets details about the run of the activity (activity run) in the pipeline. |
Remove-AzResourceGroup | Deletes a resource group including all nested resources. |
Related content
For more information on the Azure PowerShell, see Azure PowerShell documentation.
Additional Azure Data Factory PowerShell script samples can be found in the Azure Data Factory PowerShell scripts.