使用指标监视 Azure 数据资源管理器的性能、运行状况和使用情况Monitor Azure Data Explorer performance, health, and usage with metrics

Azure 数据资源管理器指标提供关于 Azure 数据资源管理器群集资源运行状况和性能的关键指标。Azure Data Explorer metrics provide key indicators as to the health and performance of the Azure Data Explorer cluster resources. 可以将本文中详述的指标作为独立指标,用来监视特定方案中 Azure 数据资源管理器群集的使用情况、运行状况和性能。Use the metrics that are detailed in this article to monitor Azure Data Explorer cluster usage, health, and performance in your specific scenario as standalone metrics. 还可以将指标用作正常运行的 Azure 仪表板Azure 警报的基础。You can also use metrics as the basis for operational Azure Dashboards and Azure Alerts.

若要详细了解 Azure 指标资源管理器,请参阅指标资源管理器For more information about Azure Metrics Explorer, see Metrics Explorer.


使用指标来监视 Azure 数据资源管理器资源Use metrics to monitor your Azure Data Explorer resources

  1. 登录到 Azure 门户Sign in to the Azure portal.
  2. 在 Azure 数据资源管理器群集的左窗格中,搜索“指标”。In the left-hand pane of your Azure Data Explorer cluster, search for metrics.
  3. 选择“指标”,以打开指标窗格,然后开始对群集进行分析。Select Metrics to open the metrics pane and begin analysis on your cluster. 在 Azure 门户中搜索和选择指标

使用指标窗格Work in the metrics pane

在指标窗格中,选择要跟踪的特定指标,选择聚合数据的方式,并创建要在仪表板上查看的指标图表。In the metrics pane, select specific metrics to track, choose how to aggregate your data, and create metric charts to view on your dashboard.

系统为 Azure 数据资源管理器群集预先选择了“资源”和“指标命名空间”选取器。 The Resource and Metric Namespace pickers are pre-selected for your Azure Data Explorer cluster. 下图中的数字对应于下面带编号的列表。The numbers in the following image correspond to the numbered list below. 这些内容可以指导你掌握在设置和查看指标时使用的不同选项。They guide you through different options in setting up and viewing your metrics.


  1. 若要创建指标图表,请选择 指标 名称和每个指标的相关 聚合To create a metric chart, select Metric name and relevant Aggregation per metric. 有关不同指标的详细信息,请参阅支持的 Azure 数据资源管理器指标For more information about different metrics, see supported Azure Data Explorer metrics.
  2. 选择“添加指标”可以查看在同一图表中绘制的多个指标。 Select Add metric to see multiple metrics plotted in the same chart.
  3. 选择“+ 新建图表”可在一个视图中查看多个图表。 Select + New chart to see multiple charts in one view.
  4. 使用时间选取器更改时间范围(默认:过去 24 小时)。Use the time picker to change the time range (default: past 24 hours).
  5. 对包含维度的指标使用 添加筛选器应用拆分Use Add filter and Apply splitting for metrics that have dimensions.
  6. 选择“固定到仪表板”可将图表配置添加到仪表板,以便可以再次查看它。 Select Pin to dashboard to add your chart configuration to the dashboards so that you can view it again.
  7. 设置 新的警报规则 可以使用设置的条件将指标可视化。Set New alert rule to visualize your metrics using the set criteria. 新的警报规则将包括图表的目标资源、指标、拆分和筛选器维度。The new alerting rule will include your target resource, metric, splitting, and filter dimensions from your chart. 警报规则创建窗格中修改这些设置。Modify these settings in the alert rule creation pane.

支持的 Azure 数据资源管理器指标Supported Azure Data Explorer metrics

Azure 数据资源管理器指标有助于深入了解资源的整体性能和使用情况,以及特定操作(如引入或查询)的相关信息。The Azure Data Explorer metrics give insight into both overall performance and use of your resources, as well as information about specific actions, such as ingestion or query. 本文中的指标已按使用类型分组。The metrics in this article have been grouped by usage type.

指标类型为:The types of metrics are:

有关适用于 Azure 数据资源管理器的 Azure Monitor 的指标列表(按字母顺序排列),请参阅受支持的 Azure 数据资源管理器群集指标For an alphabetical list of Azure Monitor's metrics for Azure Data Explorers, see supported Azure Data Explorer cluster metrics.

群集指标Cluster metrics

群集指标跟踪群集的常规运行状况。The cluster metrics track the general health of the cluster. 例如,资源和引入的使用及响应情况。For example, resource and ingestion use and responsiveness.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
缓存利用率Cache utilization 百分比Percent Avg、Max、MinAvg, Max, Min 群集当前使用的已分配缓存资源百分比。Percentage of allocated cache resources currently in use by the cluster. 缓存是为用户活动分配的、符合定义的缓存策略的 SSD 大小。Cache is the size of SSD allocated for user activity according to the defined cache policy.

80% 或更低的平均缓存利用率可以维持群集的正常状态。An average cache utilization of 80% or less is a sustainable state for a cluster. 如果平均缓存利用率高于 80%,则应对群集执行以下操作:If the average cache utilization is above 80%, the cluster should be
纵向扩展到存储优化定价层,或者scaled up to a storage optimized pricing tier or
横向扩展到更多实例。scaled out to more instances. 也可以调整缓存策略(减少缓存的天数)。Alternatively, adapt the cache policy (fewer days in cache). 如果缓存利用率超过 100%,则根据缓存策略缓存的数据大小将大于群集上的缓存总大小。If cache utilization is over 100%, the size of data to be cached, according to the caching policy, is larger that the total size of cache on the cluster.
CPUCPU 百分比Percent Avg、Max、MinAvg, Max, Min 群集中的计算机当前使用的已分配计算资源百分比。Percentage of allocated compute resources currently in use by machines in the cluster.

80% 或更低的平均 CPU 利用率可以维持群集的正常状态。An average CPU of 80% or less is sustainable for a cluster. 最大 CPU 利用率值为 100%,表示没有更多的计算资源可用于处理数据。The maximum value of CPU is 100%, which means there are no additional compute resources to process data.
如果某个群集的性能不佳,请检查最大 CPU 利用率值,以确定特定的 CPU 是否阻塞。When a cluster isn't performing well, check the maximum value of the CPU to determine if there are specific CPUs that are blocked.
引入利用率Ingestion utilization 百分比Percent Avg、Max、MinAvg, Max, Min 用于从容量策略中分配的所有资源引入数据,以执行引入的实际资源百分比。Percentage of actual resources used to ingest data from the total resources allocated, in the capacity policy, to perform ingestion. 默认的容量策略是不超过 512 个并发的引入操作,或者不超过引入中投入的群集资源数的 75%。The default capacity policy is no more than 512 concurrent ingestion operations or 75% of the cluster resources invested in ingestion.

80% 或更低的平均引入利用率可以维持群集的正常状态。Average ingestion utilization of 80% or less is a sustainable state for a cluster. 最大的引入利用率值为 100%,表示使用整个群集引入能力,这可能会生成引入队列。Maximum value of ingestion utilization is 100%, which means all cluster ingestion ability is used and an ingestion queue may result.
InstanceCountInstanceCount 计数Count AvgAvg 实例总计数。Total instance count.
保持活动状态Keep alive 计数Count AvgAvg 跟踪群集的响应度。Tracks the responsiveness of the cluster.

完全可响应的群集将返回值 1,受阻或断开连接的群集将返回 0。A fully responsive cluster returns value 1 and a blocked or disconnected cluster returns 0.
受限制的命令总数Total number of throttled commands 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 由于达到了允许的最大并发(并行)命令数,而在群集中限制(拒绝)的命令数。The number of throttled (rejected) commands in the cluster, since the maximum allowed number of concurrent (parallel) commands was reached. None
盘区总数Total number of extents 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 群集中的数据盘区总数。Total number of data extents in the cluster.

更改此项指标可能会更改大规模数据的结构并在群集上施加较高的负载,因为合并数据盘区是 CPU 密集型活动。Changes in this metric can imply massive data structure changes and high load on the cluster, since merging data extents is a CPU-heavy activity.

导出指标Export metrics

导出指标可跟踪导出操作的常规运行状况和性能,如延迟、结果、记录数和利用率。Export metrics track the general health and performance of export operations like lateness, results, number of records, and utilization.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
连续导出的记录数Continuous export number of exported records 计数Count SumSum 所有连续导出作业中导出的记录数。The number of exported records in all continuous export jobs. ContinuousExportNameContinuousExportName
连续导出最大延迟Continuous export max lateness 计数Count MaxMax 群集中连续导出作业报告的延迟(分钟)。The lateness (in minutes) reported by the continuous export jobs in the cluster. None
连续导出挂起计数Continuous export pending count 计数Count MaxMax 挂起的连续导出作业数。The number of pending continuous export jobs. 这些作业已准备好运行,但可能由于容量不足而在队列中等待。These jobs are ready to run but waiting in a queue, possibly due to insufficient capacity).
连续导出结果Continuous export result 计数Count 计数Count 每个连续导出运行的失败/成功结果。The Failure/Success result of each continuous export run. ContinuousExportNameContinuousExportName
导出利用率Export utilization 百分比Percent MaxMax 已使用的导出容量占群集中总导出容量的百分比(介于 0 和 100 之间)。Export capacity used, out of the total export capacity in the cluster (between 0 and 100). None

引入指标Ingestion metrics

引入指标可跟踪引入操作的常规运行状况和性能,如延迟、结果和数据量。Ingestion metrics track the general health and performance of ingestion operations like latency, results, and volume.


  • 将筛选器应用到图表,以便按维度绘制部分数据。Apply filters to charts to plot partial data by dimensions. 例如,浏览引入,一直浏览到特定的 DatabaseFor example, explore ingestion to a specific Database.
  • 将拆分应用到图表,以便按不同组件将数据可视化。Apply splitting to a chart to visualize data by different components. 此过程可用于分析引入管道的每个步骤所报告的指标,例如 Blobs receivedThis process is useful for analyzing metrics that are reported by each step of the ingestion pipeline, for example Blobs received.
指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
批处理 Blob 计数Batch blob count 计数Count Avg、Max、MinAvg, Max, Min 引入的已完成批处理中数据源数。Number of data sources in a completed batch for ingestion. 数据库Database
批处理持续时间Batch duration Seconds Avg、Max、MinAvg, Max, Min 引入流中批处理阶段的持续时间。The duration of the batching phase in the ingestion flow. 数据库Database
批大小Batch size 字节Bytes Avg、Max、MinAvg, Max, Min 引入的聚合批处理中未压缩的预期数据大小。Uncompressed expected data size in an aggregated batch for ingestion. 数据库Database
已处理批处理Batches processed 计数Count Sum、Max、MinSum, Max, Min 引入的已完成批处理数。Number of batches completed for ingestion.
Batching Type:批处理的完成是否基于批处理策略设置的批处理时间、数据大小或文件数限制。Batching Type: whether completion of batch was based on batching time, data size, or number of files limit, as set by batching policy.
数据库、批处理类型Database, Batching Type
已接收的 blobBlobs received 计数Count Sum、Max、MinSum, Max, Min 组件从输入流接收的 blob 数。Number of blobs received from input stream by a component.

使用“应用拆分”来分析每个组件。Use apply splitting to analyze each component.
数据库、组件类型、组件名称Database, Component Type, Component Name
已处理的 blobBlobs processed 计数Count Sum、Max、MinSum, Max, Min 组件处理的 blob 数。Number of blobs processed by a component.

使用“应用拆分”来分析每个组件。Use apply splitting to analyze each component.
数据库、组件类型、组件名称Database, Component Type, Component Name
已删除的 blobBlobs dropped 计数Count Sum、Max、MinSum, Max, Min 被组件永久删除的 blob 数。Number of blobs permanently dropped by a component. 对于每个这样的 blob,都将发送一个包含失败原因的 Ingestion result 指标。For each such blob, an Ingestion result metric with a failure reason is sent.

使用“应用拆分”来分析每个组件。Use apply splitting to analyze each component.
数据库、组件类型、组件名称Database, Component Type, Component Name
发现延迟Discovery latency Seconds 平均值Avg 从数据排队开始到被数据连接发现为止的时间。Time from data enqueue until discovery by data connections. 此时间未包括在“阶段延迟”或“引入延迟”指标中 This time isn't included in the Stage latency or in the Ingestion latency metrics 组件类型、组件名称Component Type, Component Name
已接收的事件Events received 计数Count Sum、Max、MinSum, Max, Min 数据连接从输入流接收的事件数。Number of events received by data connections from input stream. 组件类型、组件名称Component Type, Component Name
已处理的事件Events processed 计数Count Sum、Max、MinSum, Max, Min 数据连接处理的事件数。Number of events processed by data connections. 组件类型、组件名称Component Type, Component Name
已删除的事件Events dropped 计数Count Sum、Max、MinSum, Max, Min 数据连接永久删除的事件数。Number of events permanently dropped by data connections. 组件类型、组件名称Component Type, Component Name
处理的事件数(适用于事件中心/IoT 中心)Events processed (for Event/IoT Hubs) 计数Count Max、Min、SumMax, Min, Sum 从事件中心读取的以及由群集处理的事件总数。Total number of events read from Event Hubs and processed by the cluster. 这些事件拆分为两组:群集引擎拒绝的事件和群集引擎接受的事件。These events are split into two groups: events rejected, and events accepted by the cluster engine. 状态Status
引入延迟Ingestion latency Seconds Avg、Max、MinAvg, Max, Min 引入数据的延迟,根据从群集中收到数据,到数据可供查询的时间来测得。Latency of data ingested, from the time the data was received in the cluster until it's ready for query. 引入延迟周期决于引入方案。The ingestion latency period depends on the ingestion scenario. None
引入结果Ingestion result 计数Count SumSum 失败的或成功的引入操作的总数。Total number of ingest operations that either failed or succeeded.

使用“应用拆分”可以创建成功和失败结果 Bucket,并分析维度(“值” > “状态”)。Use apply splitting to create buckets of success and failure results and analyze the dimensions (Value > Status).
若要详细了解可能的失败结果,请参阅 Azure 数据资源管理器中的引入错误代码For more information about possible fail results, see Ingestion error codes in Azure Data Explorer
引入量 (MB)Ingestion volume (in MB) 计数Count Max、SumMax, Sum 引入到群集中的数据在压缩前的总大小 (MB)。The total size of data ingested to the cluster (in MB) before compression. 数据库Database
队列长度Queue length 计数Count AvgAvg 组件输入队列中挂起的消息数。Number of pending messages in a component's input queue. 组件类型Component Type
队列最早消息Queue oldest message Seconds 平均值Avg 从在组件的输入队列中插入最早消息开始算起的时间,以秒为单位。Time in seconds from when the oldest message in a component's input queue has been inserted. 组件类型Component Type
接收的数据大小(字节)Received data size bytes 字节Bytes Avg、SumAvg, Sum 数据连接从输入流接收的数据的大小。Size of data received by data connections from input stream. 组件类型、组件名称Component Type, Component Name
阶段延迟Stage latency Seconds 平均值Avg 从 Azure 数据资源管理器发现消息到引入组件收到其要处理的内容的时间。Time from when a message is discovered by Azure Data Explorer, until its content is received by an ingestion component for processing.

使用“应用筛选器”并选择“组件类型”>“EngineStorage”,以便显示总引入延迟。Use apply filters and select Component Type > EngineStorage to show the total ingestion latency.
数据库、组件类型Database, Component Type

流引入指标Streaming ingest metrics

流引入指标跟踪流引入数据和请求速率、持续时间与结果。Streaming ingest metrics track streaming ingestion data and request rate, duration, and results.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
流引入数据速率Streaming Ingest Data Rate 计数Count RateRequestsPerSecondRateRequestsPerSecond 引入群集的数据总量。Total volume of data ingested to the cluster. None
流引入持续时间Streaming Ingest Duration 毫秒Milliseconds Avg、Max、MinAvg, Max, Min 所有流引入请求的总持续时间。Total duration of all streaming ingestion requests. None
流引入请求速率Streaming Ingest Request Rate 计数Count Count、Avg、Max、Min、SumCount, Avg, Max, Min, Sum 流引入请求总数。Total number of streaming ingestion requests. None
流引入结果Streaming Ingest Result 计数Count AvgAvg 流引入请求总数,按结果类型列出。Total number of streaming ingestion requests by result type. 结果Result

查询指标Query metrics

查询性能指标跟踪查询持续时间,以及并发或受限制查询的总数。Query performance metrics track query duration and total number of concurrent or throttled queries.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
查询持续时间Query duration 毫秒Milliseconds Avg、Min、Max、SumAvg, Min, Max, Sum 收到查询结果之前所花费的总时间(不包括网络延迟)。Total time until query results are received (doesn't include network latency). QueryStatusQueryStatus
QueryResultQueryResult 计数Count 计数Count 总查询数。Total number of queries. QueryStatusQueryStatus
并发查询总数Total number of concurrent queries 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 群集中并行运行的查询数。The number of queries run in parallel in the cluster. 使用此指标可以很好地评估群集上的负载。This metric is a good way to estimate the load on the cluster. None
受限制的查询总数Total number of throttled queries 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 群集中受限制(被拒绝)的查询数。The number of throttled (rejected) queries in the cluster. 允许的最大并发(并行)查询数在请求速率限制策略中进行定义。The maximum number of concurrent (parallel) queries allowed is defined in the request rate limit policy. None

具体化视图指标Materialized view metrics

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
MaterializedViewHealthMaterializedViewHealth 1、01, 0 平均值Avg 当视图被认为正常时,值为 1,否则为 0。Value is 1 if the view is considered healthy, otherwise 0. Database、MaterializedViewNameDatabase, MaterializedViewName
MaterializedViewAgeMinutesMaterializedViewAgeMinutes 分钟数Minutes 平均值Avg 视图的 age 定义为当前时间减去由视图处理的上次引入时间。The age of the view is defined by the current time minus the last ingestion time processed by the view. 指标值是以分钟为单位的时间(值越小,视图越“正常”)。Metric value is time in minutes (the lower the value is, the view is "healthier"). Database、MaterializedViewNameDatabase, MaterializedViewName
MaterializedViewResultMaterializedViewResult 11 平均值Avg 指标包括一个 Result 维度,该维度指示上一个具体化循环的结果(有关可能值的详细信息,请参见 MaterializedViewResult 指标)。Metric includes a Result dimension indicating the result of the last materialization cycle (see the MaterializedViewResult metric for details about possible values). 指标值始终等于 1。Metric value always equals 1. Database、MaterializedViewName、ResultDatabase, MaterializedViewName, Result
MaterializedViewRecordsInDeltaMaterializedViewRecordsInDelta 记录计数Records count 平均值Avg 当前在源表的未处理部分中的记录数。The number of records currently in the non-processed part of the source table. 有关详细信息,请参阅具体化视图的工作原理For more information, see how materialized views work Database、MaterializedViewNameDatabase, MaterializedViewName
MaterializedViewExtentsRebuildMaterializedViewExtentsRebuild 区计数Extents count 平均值Avg 在具体化循环中重建的区数。The number of extents rebuilt in the materialization cycle. Database、MaterializedViewNameDatabase, MaterializedViewName
MaterializedViewDataLossMaterializedViewDataLoss 11 MaxMax 当未处理的源数据接近保留期时,会触发指标。Metric is fired when unprocessed source data is approaching retention. 指示具体化视图运行不正常。Indicates that the materialized view is unhealthy. Database、MaterializedViewName、KindDatabase, MaterializedViewName, Kind

后续步骤Next steps