Programmatically monitor an Azure Data Factory
APPLIES TO: Azure Data Factory Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
This article describes how to monitor a pipeline in a data factory by using different software development kits (SDKs).
Note
We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.
Data range
Data Factory only stores pipeline run data for 45 days. When you query programmatically for data about Data Factory pipeline runs - for example, with the PowerShell command Get-AzDataFactoryV2PipelineRun
- there are no maximum dates for the optional LastUpdatedAfter
and LastUpdatedBefore
parameters. But if you query for data for the past year, for example, you won't get an error but only pipeline run data from the last 45 days.
If you want to keep pipeline run data for more than 45 days, set up your own diagnostic logging with Azure Monitor.
Pipeline run information
For pipeline run properties, refer to PipelineRun API reference. A pipeline run has different status during its lifecycle, the possible values of run status are listed below:
- Queued
- InProgress
- Succeeded
- Failed
- Canceling
- Cancelled
.NET
For a complete walk-through of creating and monitoring a pipeline using .NET SDK, see Create a data factory and pipeline using .NET.
Add the following code to continuously check the status of the pipeline run until it finishes copying the data.
// Monitor the pipeline run Console.WriteLine("Checking pipeline run status..."); PipelineRun pipelineRun; while (true) { pipelineRun = client.PipelineRuns.Get(resourceGroup, dataFactoryName, runResponse.RunId); Console.WriteLine("Status: " + pipelineRun.Status); if (pipelineRun.Status == "InProgress" || pipelineRun.Status == "Queued") System.Threading.Thread.Sleep(15000); else break; }
Add the following code to that retrieves copy activity run details, for example, size of the data read/written.
// Check the copy activity run details Console.WriteLine("Checking copy activity run details..."); RunFilterParameters filterParams = new RunFilterParameters( DateTime.UtcNow.AddMinutes(-10), DateTime.UtcNow.AddMinutes(10)); ActivityRunsQueryResponse queryResponse = client.ActivityRuns.QueryByPipelineRun( resourceGroup, dataFactoryName, runResponse.RunId, filterParams); if (pipelineRun.Status == "Succeeded") Console.WriteLine(queryResponse.Value.First().Output); else Console.WriteLine(queryResponse.Value.First().Error); Console.WriteLine("\nPress any key to exit..."); Console.ReadKey();
For complete documentation on .NET SDK, see Data Factory .NET SDK reference.
Python
For a complete walk-through of creating and monitoring a pipeline using Python SDK, see Create a data factory and pipeline using Python.
To monitor the pipeline run, add the following code:
# Monitor the pipeline run
time.sleep(30)
pipeline_run = adf_client.pipeline_runs.get(
rg_name, df_name, run_response.run_id)
print("\n\tPipeline run status: {}".format(pipeline_run.status))
filter_params = RunFilterParameters(
last_updated_after=datetime.now() - timedelta(1), last_updated_before=datetime.now() + timedelta(1))
query_response = adf_client.activity_runs.query_by_pipeline_run(
rg_name, df_name, pipeline_run.run_id, filter_params)
print_activity_run_details(query_response.value[0])
For complete documentation on Python SDK, see Data Factory Python SDK reference.
REST API
For a complete walk-through of creating and monitoring a pipeline using REST API, see Create a data factory and pipeline using REST API.
Run the following script to continuously check the pipeline run status until it finishes copying the data.
$request = "https://management.chinacloudapi.cn/subscriptions/${subsId}/resourceGroups/${resourceGroup}/providers/Microsoft.DataFactory/factories/${dataFactoryName}/pipelineruns/${runId}?api-version=${apiVersion}" while ($True) { $response = Invoke-RestMethod -Method GET -Uri $request -Header $authHeader Write-Host "Pipeline run status: " $response.Status -foregroundcolor "Yellow" if ( ($response.Status -eq "InProgress") -or ($response.Status -eq "Queued") ) { Start-Sleep -Seconds 15 } else { $response | ConvertTo-Json break } }
Run the following script to retrieve copy activity run details, for example, size of the data read/written.
$request = "https://management.chinacloudapi.cn/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelineruns/${runId}/queryActivityruns?api-version=${apiVersion}&startTime="+(Get-Date).ToString('yyyy-MM-dd')+"&endTime="+(Get-Date).AddDays(1).ToString('yyyy-MM-dd')+"&pipelineName=Adfv2QuickStartPipeline" $response = Invoke-RestMethod -Method POST -Uri $request -Header $authHeader $response | ConvertTo-Json
For complete documentation on REST API, see Data Factory REST API reference.
PowerShell
For a complete walk-through of creating and monitoring a pipeline using PowerShell, see Create a data factory and pipeline using PowerShell.
Run the following script to continuously check the pipeline run status until it finishes copying the data.
while ($True) { $run = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $resourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $runId if ($run) { if ( ($run.Status -ne "InProgress") -and ($run.Status -ne "Queued") ) { Write-Output ("Pipeline run finished. The status is: " + $run.Status) $run break } Write-Output ("Pipeline is running...status: " + $run.Status) } Start-Sleep -Seconds 30 }
Run the following script to retrieve copy activity run details, for example, size of the data read/written.
Write-Host "Activity run details:" -foregroundcolor "Yellow" $result = Get-AzDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30) $result Write-Host "Activity 'Output' section:" -foregroundcolor "Yellow" $result.Output -join "`r`n" Write-Host "\nActivity 'Error' section:" -foregroundcolor "Yellow" $result.Error -join "`r`n"
For complete documentation on PowerShell cmdlets, see Data Factory PowerShell cmdlet reference.
Related content
See Monitor pipelines using Azure Monitor article to learn about using Azure Monitor to monitor Data Factory pipelines.