以编程方式监视 Azure 数据工厂Programmatically monitor an Azure data factory

适用于:是 Azure 数据工厂否 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory noAzure Synapse Analytics (Preview)

本文介绍如何使用不同的软件开发工具包 (SDK) 监视数据工厂中的管道。This article describes how to monitor a pipeline in a data factory by using different software development kits (SDKs).

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

数据范围Data range

数据工厂仅将管道运行数据存储 45 天。Data Factory only stores pipeline run data for 45 days. 以编程方式查询有关数据工厂管道运行的数据时 - 比如使用 PowerShell 命令 Get-AzDataFactoryV2PipelineRun 来查询,对于可选的 LastUpdatedAfterLastUpdatedBefore 参数,无最大日期限制。When you query programmatically for data about Data Factory pipeline runs - for example, with the PowerShell command Get-AzDataFactoryV2PipelineRun - there are no maximum dates for the optional LastUpdatedAfter and LastUpdatedBefore parameters. 但是,如果查询过去一年(举例)的数据,查询不会返回错误,而仅返回最近 45 天的管道运行数据。But if you query for data for the past year, for example, the query does not return an error, but only returns pipeline run data from the last 45 days.

如果要保留管道运行数据超过 45 天,可使用 Azure Monitor 设置自己的诊断日志记录。If you want to persist pipeline run data for more than 45 days, set up your own diagnostic logging with Azure Monitor.

.NET.NET

有关使用 .NET SDK 创建和监视管道的完整演练,请参阅使用 .NET 创建数据工厂和管道For a complete walkthrough of creating and monitoring a pipeline using .NET SDK, see Create a data factory and pipeline using .NET.

  1. 添加以下代码以持续检查管道运行状态,直到它完成数据复制为止。Add the following code to continuously check the status of the pipeline run until it finishes copying the data.

    // Monitor the pipeline run
    Console.WriteLine("Checking pipeline run status...");
    PipelineRun pipelineRun;
    while (true)
    {
        pipelineRun = client.PipelineRuns.Get(resourceGroup, dataFactoryName, runResponse.RunId);
        Console.WriteLine("Status: " + pipelineRun.Status);
        if (pipelineRun.Status == "InProgress")
            System.Threading.Thread.Sleep(15000);
        else
            break;
    }
    
  2. 添加以下代码,用于检索复制活动的运行详细信息,例如,读取/写入的数据大小。Add the following code to that retrieves copy activity run details, for example, size of the data read/written.

    // Check the copy activity run details
    Console.WriteLine("Checking copy activity run details...");
    
    List<ActivityRun> activityRuns = client.ActivityRuns.ListByPipelineRun(
    resourceGroup, dataFactoryName, runResponse.RunId, DateTime.UtcNow.AddMinutes(-10), DateTime.UtcNow.AddMinutes(10)).ToList(); 
    if (pipelineRun.Status == "Succeeded")
        Console.WriteLine(activityRuns.First().Output);
    else
        Console.WriteLine(activityRuns.First().Error);
    Console.WriteLine("\nPress any key to exit...");
    Console.ReadKey();
    

有关 .NET SDK 的完整文档,请参阅数据工厂 .NET SDK 参考For complete documentation on .NET SDK, see Data Factory .NET SDK reference.

PythonPython

有关使用 Python SDK 创建和监视管道的完整演练,请参阅使用 Python 创建数据工厂和管道For a complete walkthrough of creating and monitoring a pipeline using Python SDK, see Create a data factory and pipeline using Python.

要监视管道的运行,请添加以下代码:To monitor the pipeline run, add the following code:

# Monitor the pipeline run
time.sleep(30)
pipeline_run = adf_client.pipeline_runs.get(
    rg_name, df_name, run_response.run_id)
print("\n\tPipeline run status: {}".format(pipeline_run.status))
activity_runs_paged = list(adf_client.activity_runs.list_by_pipeline_run(
    rg_name, df_name, pipeline_run.run_id, datetime.now() - timedelta(1),  datetime.now() + timedelta(1)))
print_activity_run_details(activity_runs_paged[0])

有关 Python SDK 的完整文档,请参阅数据工厂 Python SDK 参考For complete documentation on Python SDK, see Data Factory Python SDK reference.

REST APIREST API

有关使用 REST API 创建和监视管道的完整演练,请参阅使用 REST API 创建数据工厂和管道For a complete walkthrough of creating and monitoring a pipeline using REST API, see Create a data factory and pipeline using REST API.

  1. 运行以下脚本来持续检查管道运行状态,直到它完成数据复制为止。Run the following script to continuously check the pipeline run status until it finishes copying the data.

    $request = "https://management.chinacloudapi.cn/subscriptions/${subsId}/resourceGroups/${resourceGroup}/providers/Microsoft.DataFactory/factories/${dataFactoryName}/pipelineruns/${runId}?api-version=${apiVersion}"
    while ($True) {
        $response = Invoke-RestMethod -Method GET -Uri $request -Header $authHeader
        Write-Host  "Pipeline run status: " $response.Status -foregroundcolor "Yellow"
    
        if ($response.Status -eq "InProgress") {
            Start-Sleep -Seconds 15
        }
        else {
            $response | ConvertTo-Json
            break
        }
    }
    
  2. 运行以下脚本来检索复制活动运行详细信息,例如,读取/写入的数据的大小。Run the following script to retrieve copy activity run details, for example, size of the data read/written.

    $request = "https://management.chinacloudapi.cn/subscriptions/${subsId}/resourceGroups/${resourceGroup}/providers/Microsoft.DataFactory/factories/${dataFactoryName}/pipelineruns/${runId}/activityruns?api-version=${apiVersion}&startTime="+(Get-Date).ToString('yyyy-MM-dd')+"&endTime="+(Get-Date).AddDays(1).ToString('yyyy-MM-dd')+"&pipelineName=Adfv2QuickStartPipeline"
    $response = Invoke-RestMethod -Method GET -Uri $request -Header $authHeader
    $response | ConvertTo-Json
    

有关 REST API 的完整文档,请参阅数据工厂 REST API 参考For complete documentation on REST API, see Data Factory REST API reference.

PowerShellPowerShell

有关使用 PowerShell 创建和监视管道的完整演练,请参阅使用 PowerShell 创建数据工厂和管道For a complete walkthrough of creating and monitoring a pipeline using PowerShell, see Create a data factory and pipeline using PowerShell.

  1. 运行以下脚本来持续检查管道运行状态,直到它完成数据复制为止。Run the following script to continuously check the pipeline run status until it finishes copying the data.

    while ($True) {
        $run = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $resourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $runId
    
        if ($run) {
            if ($run.Status -ne 'InProgress') {
                Write-Host "Pipeline run finished. The status is: " $run.Status -foregroundcolor "Yellow"
                $run
                break
            }
            Write-Host  "Pipeline is running...status: InProgress" -foregroundcolor "Yellow"
        }
    
        Start-Sleep -Seconds 30
    }
    
  2. 运行以下脚本来检索复制活动运行详细信息,例如,读取/写入的数据的大小。Run the following script to retrieve copy activity run details, for example, size of the data read/written.

    Write-Host "Activity run details:" -foregroundcolor "Yellow"
    $result = Get-AzDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)
    $result
    
    Write-Host "Activity 'Output' section:" -foregroundcolor "Yellow"
    $result.Output -join "`r`n"
    
    Write-Host "\nActivity 'Error' section:" -foregroundcolor "Yellow"
    $result.Error -join "`r`n"
    

有关 PowerShell cmdlet 的完整文档,请参阅数据工厂 PowerShell cmdlet 参考For complete documentation on PowerShell cmdlets, see Data Factory PowerShell cmdlet reference.

后续步骤Next steps

若要了解如何使用 Azure Monitor 监视数据工厂管道,请参阅使用 Azure Monitor 监视管道一文。See Monitor pipelines using Azure Monitor article to learn about using Azure Monitor to monitor Data Factory pipelines.