监视复制活动Monitor copy activity

适用于:是 Azure 数据工厂是 Azure Synapse Analytics(预览版)APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

本文概述如何监视 Azure 数据工厂中的复制活动执行情况。This article outlines how to monitor the copy activity execution in Azure Data Factory. 它是基于概述复制活动总体的复制活动概述一文。It builds on the copy activity overview article that presents a general overview of copy activity.

直观地监视Monitor visually

在 Azure 数据工厂中创建并发布管道以后,即可将其与触发器关联,或者手动启动临时运行。Once you've created and published a pipeline in Azure Data Factory, you can associate it with a trigger or manually kick off an ad hoc run. 可以在 Azure 数据工厂用户体验中以本机方式监视所有管道运行。You can monitor all of your pipeline runs natively in the Azure Data Factory user experience. 参阅直观监视 Azure 数据工厂,对 Azure 数据工厂监视进行常规了解。Learn about Azure Data Factory monitoring in general from Visually monitor Azure Data Factory.

若要监视复制活动运行,请转到数据工厂的“创作和监视”UI。To monitor the Copy activity run, go to your data factory Author & Monitor UI. 在“监视器”选项卡上,可以看到一个管道运行列表,单击“管道名称”链接即可访问管道运行中的活动运行的列表。On the Monitor tab, you see a list of pipeline runs, click the pipeline name link to access the list of activity runs in the pipeline run.

监视管道运行

在此级别,可以看到复制活动输入、输出和错误的链接(如果复制活动运行失败)以及持续时间/状态等统计信息。At this level, you can see links to copy activity input, output, and errors (if the Copy activity run fails), as well as statistics like duration/status. 单击复制活动名称旁边的“详细信息” 按钮(眼镜)即可详细了解复制活动执行情况。Clicking the Details button (eyeglasses) next to the copy activity name will give you deep details on your copy activity execution.

监视复制活动运行

在此图形监视视图中,Azure 数据工厂会提供复制活动执行信息,其中包括数据读取/写入量、从源复制到接收器的数据的文件数/行数、吞吐量、针对复制方案应用的配置、使用相应的持续时间和详细信息执行复制活动的步骤,等等。In this graphical monitoring view, Azure Data Factory presents you the copy activity execution information, including data read/written volume, number of files/rows of data copied from source to sink, throughput, the configurations applied for your copy scenario, steps the copy activity goes through with corresponding durations and details, and more. 请参阅此表,了解每个可能的指标及其详细说明。Refer to this table on each possible metric and its detailed description.

在某些方案中,当你运行数据工厂中的复制活动时,会在复制活动监视视图顶部看到“性能调优提示”,如示例中所示。In some scenarios, when you run a Copy activity in Data Factory, you'll see "Performance tuning tips" at the top of the copy activity monitoring view as shown in the example. 这些提示告知 ADF 针对此特定复制运行确定的瓶颈,并建议如何进行更改,以便提升复制吞吐量。The tips tell you the bottleneck identified by ADF for the specific copy run, along with suggestion on what to change to boost copy throughput. 详细了解自动性能优化提示Learn more about auto performance tuning tips.

底部的执行详细信息和持续时间描述了复制活动所要经历的重要阶段,这对于排查复制性能问题特别有用。The bottom execution details and durations describes the key steps your copy activity goes through, which is especially useful for troubleshooting the copy performance. 复制运行的瓶颈就是持续时间最长的那个运行。The bottleneck of your copy run is the one with the longest duration. 请参阅排查复制活动性能问题,了解每个阶段所代表的内容以及详细的故障排除指南。Refer to Troubleshoot copy activity performance on for what each stage represents and the detailed troubleshooting guidance.

示例:从 Amazon S3 复制到 Azure Data Lake Storage Gen2Example: Copy from Amazon S3 to Azure Data Lake Storage Gen2

监视复制活动运行详细信息

以编程方式监视Monitor programmatically

“复制活动运行结果” > “输出”部分(用于呈现 UI 监视视图)中也会返回复制活动执行详细信息和性能特征。Copy activity execution details and performance characteristics are also returned in the Copy Activity run result > Output section, which is used to render the UI monitoring view. 下面是可能返回的属性的完整列表。Following is a complete list of properties that might be returned. 只会显示适用于你的复制方案的属性。You'll see only the properties that are applicable to your copy scenario. 若要对如何以编程方式监视活动运行进行常规了解,请参阅以编程方式监视 Azure 数据工厂For information about how to monitor activity runs programmatically in general, see Programmatically monitor an Azure data factory.

属性名称Property name 说明Description 输出中的单位Unit in output
dataReaddataRead 从源读取的实际数据量。The actual amount of data read from the source. Int64 值,以字节为单位Int64 value, in bytes
DataWrittendataWritten 写入/提交到接收器的数据的实际装载。The actual mount of data written/committed to the sink. 大小可能不同于 dataRead 大小,因为它与每个数据存储存储数据的方式相关。The size may be different from dataRead size, as it relates how each data store stores the data. Int64 值,以字节为单位Int64 value, in bytes
filesReadfilesRead 从基于文件的源中读取的文件数。The number of files read from the file-based source. Int64 值(未指定单位)Int64 value (no unit)
filesWrittenfilesWritten 写入/提交到基于文件的接收器的文件数。The number of files written/committed to the file-based sink. Int64 值(未指定单位)Int64 value (no unit)
filesSkippedfilesSkipped 从基于文件的源中跳过的文件数。The number of files skipped from the file-based source. Int64 值(未指定单位)Int64 value (no unit)
dataConsistencyVerificationdataConsistencyVerification 数据一致性验证的详细信息,可在其中查看是否已验证复制的数据在源存储和目标存储之间的一致性。Details of data consistency verification where you can see if your copied data has been verified to be consistent between source and destination store. 有关详细信息,请参阅本文Learn more from this article. 数组Array
sourcePeakConnectionssourcePeakConnections 复制活动运行期间与源数据存储建立的并发连接峰值数量。Peak number of concurrent connections established to the source data store during the Copy activity run. Int64 值(未指定单位)Int64 value (no unit)
sinkPeakConnectionssinkPeakConnections 复制活动运行期间与接收器数据存储建立的并发连接峰值数量。Peak number of concurrent connections established to the sink data store during the Copy activity run. Int64 值(未指定单位)Int64 value (no unit)
rowsReadrowsRead 从源中读取的行数。Number of rows read from the source. 此指标不适用于不进行分析而按原样复制文件的情况,例如,当源和接收器数据集是二进制格式类型或具有相同设置的其他格式类型时。This metric does not apply when copying files as-is without parsing them, for example, when source and sink datasets are binary format type, or other format type with identical settings. Int64 值(未指定单位)Int64 value (no unit)
rowsCopiedrowsCopied 复制到接收器的行数。Number of rows copied to sink. 此指标不适用于不进行分析而按原样复制文件的情况,例如,当源和接收器数据集是二进制格式类型或具有相同设置的其他格式类型时。This metric does not apply when copying files as-is without parsing them, for example, when source and sink datasets are binary format type, or other format type with identical settings. Int64 值(未指定单位)Int64 value (no unit)
rowsSkippedrowsSkipped 跳过的不兼容行数。Number of incompatible rows that were skipped. 可通过将 enableSkipIncompatibleRow 设置为 true 来跳过不兼容的行。You can enable incompatible rows to be skipped by setting enableSkipIncompatibleRow to true. Int64 值(未指定单位)Int64 value (no unit)
copyDurationcopyDuration 复制运行的持续时间。Duration of the copy run. Int32 值,以秒为单位Int32 value, in seconds
throughputthroughput 数据传输速率。Rate of data transfer. 浮点数,以 KBps 为单位Floating point number, in KBps
sourcePeakConnectionssourcePeakConnections 复制活动运行期间与源数据存储建立的并发连接峰值数量。Peak number of concurrent connections established to the source data store during the Copy activity run. Int32 值(无单位)Int32 value (no unit)
sinkPeakConnectionssinkPeakConnections 复制活动运行期间与接收器数据存储建立的并发连接峰值数量。Peak number of concurrent connections established to the sink data store during the Copy activity run. Int32 值(无单位)Int32 value (no unit)
sqlDwPolyBasesqlDwPolyBase 将数据复制到 Azure Synapse Analytics(以前称为 SQL 数据仓库)时是否使用了 PolyBase。Whether PolyBase is used when data is copied into Azure Synapse Analytics (formerly SQL Data Warehouse). 布尔Boolean
redshiftUnloadredshiftUnload 从 Redshift 复制数据时是否使用了 UNLOAD。Whether UNLOAD is used when data is copied from Redshift. 布尔Boolean
hdfsDistcphdfsDistcp 从 HDFS 复制数据时是否使用了 DistCp。Whether DistCp is used when data is copied from HDFS. 布尔Boolean
effectiveIntegrationRuntimeeffectiveIntegrationRuntime 用来为活动运行提供支持的一个或多个集成运行时 (IR),采用 <IR name> (<region if it's Azure IR>) 格式。The integration runtime (IR) or runtimes used to power the activity run, in the format <IR name> (<region if it's Azure IR>). 文本(字符串)Text (string)
usedDataIntegrationUnitsusedDataIntegrationUnits 复制期间的有效数据集成单位。The effective Data Integration Units during copy. Int32 值Int32 value
usedParallelCopiesusedParallelCopies 复制期间的有效 parallelCopies。The effective parallelCopies during copy. Int32 值Int32 value
logPathlogPath Blob 存储中跳过的数据的会话日志路径。Path to the session log of skipped data in the blob storage. 请参阅容错See Fault tolerance. 文本(字符串)Text (string)
executionDetailsexecutionDetails 有关复制活动经历的各个阶段、相应步骤、持续时间、配置等的更多详细信息。More details on the stages the Copy activity goes through and the corresponding steps, durations, configurations, and so on. 不建议分析此节,因为它有可能发生更改。We don't recommend that you parse this section because it might change. 若要更好地了解如何通过它来了解复制性能并排查其问题,请参阅以视觉方式进行监视部分。To better understand how it helps you understand and troubleshoot copy performance, refer to Monitor visually section. ArrayArray
perfRecommendationperfRecommendation 复制性能优化提示。Copy performance tuning tips. 有关详细信息,请参阅性能优化提示See Performance tuning tips for details. ArrayArray
billingReferencebillingReference 给定运行的计费用量。The billing consumption for the given run. 对象Object
durationInQueuedurationInQueue 复制活动开始执行之前的排队持续时间(秒)。Queueing duration in second before the copy activity starts to execute. 对象Object

示例:Example:

"output": {
    "dataRead": 1180089300500,
    "dataWritten": 1180089300500,
    "filesRead": 110,
    "filesWritten": 110,
    "filesSkipped": 0,
    "sourcePeakConnections": 640,
    "sinkPeakConnections": 1024,
    "copyDuration": 388,
    "throughput": 2970183,
    "errors": [],
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (China East 2)",
    "usedDataIntegrationUnits": 128,
    "billingReference": "{\"activityType\":\"DataMovement\",\"billableDuration\":[{\"Managed\":11.733333333333336}]}",
    "usedParallelCopies": 64,
    "dataConsistencyVerification": 
    { 
        "VerificationResult": "Verified", 
        "InconsistentData": "None" 
    },
    "executionDetails": [
        {
            "source": {
                "type": "AmazonS3"
            },
            "sink": {
                "type": "AzureBlobFS",
                "region": "China East 2",
                "throttlingErrors": 6
            },
            "status": "Succeeded",
            "start": "2020-03-04T02:13:25.1454206Z",
            "duration": 388,
            "usedDataIntegrationUnits": 128,
            "usedParallelCopies": 64,
            "profile": {
                "queue": {
                    "status": "Completed",
                    "duration": 2
                },
                "transfer": {
                    "status": "Completed",
                    "duration": 386,
                    "details": {
                        "listingSource": {
                            "type": "AmazonS3",
                            "workingDuration": 0
                        },
                        "readingFromSource": {
                            "type": "AmazonS3",
                            "workingDuration": 301
                        },
                        "writingToSink": {
                            "type": "AzureBlobFS",
                            "workingDuration": 335
                        }
                    }
                }
            },
            "detailedDurations": {
                "queuingDuration": 2,
                "transferDuration": 386
            }
        }
    ],
    "perfRecommendation": [
        {
            "Tip": "6 write operations were throttled by the sink data store. To achieve better performance, you are suggested to check and increase the allowed request rate for Azure Data Lake Storage Gen2, or reduce the number of concurrent copy runs and other data access, or reduce the DIU or parallel copy.",
            "ReferUrl": "https://docs.azure.cn/zh-cn/data-factory/copy-activity-performance#performance-tuning-steps",
            "RuleName": "ReduceThrottlingErrorPerfRecommendationRule"
        }
    ],
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    }
}

后续步骤Next steps

请参阅其他复制活动文章:See the other Copy Activity articles:

-“复制活动”概述- Copy activity overview

- 复制活动性能- Copy activity performance