使用指标监视 Azure 数据资源管理器的性能、运行状况和使用情况Monitor Azure Data Explorer performance, health, and usage with metrics

Azure 数据资源管理器指标提供关于 Azure 数据资源管理器群集资源运行状况和性能的关键指标。Azure Data Explorer metrics provide key indicators as to the health and performance of the Azure Data Explorer cluster resources. 可以将本文中详述的指标作为独立指标,用来监视特定方案中 Azure 数据资源管理器群集的使用情况、运行状况和性能。Use the metrics that are detailed in this article to monitor Azure Data Explorer cluster usage, health, and performance in your specific scenario as standalone metrics. 还可以将指标用作正常运行的 Azure 仪表板Azure 警报的基础。You can also use metrics as the basis for operational Azure Dashboards and Azure Alerts.

若要详细了解 Azure 指标资源管理器,请参阅指标资源管理器For more information about Azure Metrics Explorer, see Metrics Explorer.

先决条件Prerequisites

使用指标来监视 Azure 数据资源管理器资源Use metrics to monitor your Azure Data Explorer resources

  1. 登录到 Azure 门户Sign in to the Azure portal.
  2. 在 Azure 数据资源管理器群集的左窗格中,搜索“指标”。In the left-hand pane of your Azure Data Explorer cluster, search for metrics.
  3. 选择“指标”,以打开指标窗格,然后开始对群集进行分析。Select Metrics to open the metrics pane and begin analysis on your cluster. 在 Azure 门户中搜索和选择指标

使用指标窗格Work in the metrics pane

在指标窗格中,选择要跟踪的特定指标,选择聚合数据的方式,并创建要在仪表板上查看的指标图表。In the metrics pane, select specific metrics to track, choose how to aggregate your data, and create metric charts to view on your dashboard.

系统为 Azure 数据资源管理器群集预先选择了“资源”和“指标命名空间”选取器。 The Resource and Metric Namespace pickers are pre-selected for your Azure Data Explorer cluster. 下图中的数字对应于下面带编号的列表。The numbers in the following image correspond to the numbered list below. 这些内容可以指导你掌握在设置和查看指标时使用的不同选项。They guide you through different options in setting up and viewing your metrics.

“指标”窗格

  1. 若要创建指标图表,请选择指标名称和每个指标的相关聚合To create a metric chart, select Metric name and relevant Aggregation per metric. 有关不同指标的详细信息,请参阅支持的 Azure 数据资源管理器指标For more information about different metrics, see supported Azure Data Explorer metrics.
  2. 选择“添加指标”可以查看在同一图表中绘制的多个指标。 Select Add metric to see multiple metrics plotted in the same chart.
  3. 选择“+ 新建图表”可在一个视图中查看多个图表。 Select + New chart to see multiple charts in one view.
  4. 使用时间选取器更改时间范围(默认:过去 24 小时)。Use the time picker to change the time range (default: past 24 hours).
  5. 对包含维度的指标使用添加筛选器应用拆分Use Add filter and Apply splitting for metrics that have dimensions.
  6. 选择“固定到仪表板”可将图表配置添加到仪表板,以便可以再次查看它。 Select Pin to dashboard to add your chart configuration to the dashboards so that you can view it again.
  7. 设置新的警报规则可以使用设置的条件将指标可视化。Set New alert rule to visualize your metrics using the set criteria. 新的警报规则将包括图表的目标资源、指标、拆分和筛选器维度。The new alerting rule will include your target resource, metric, splitting, and filter dimensions from your chart. 警报规则创建窗格中修改这些设置。Modify these settings in the alert rule creation pane.

支持的 Azure 数据资源管理器指标Supported Azure Data Explorer metrics

Azure 数据资源管理器指标有助于深入了解资源的整体性能和使用情况,以及特定操作(如引入或查询)的相关信息。The Azure Data Explorer metrics give insight into both overall performance and use of your resources, as well as information about specific actions, such as ingestion or query. 本文中的指标已按使用类型分组。The metrics in this article have been grouped by usage type.

指标类型为:The types of metrics are:

有关适用于 Azure 数据资源管理器的 Azure Monitor 的指标列表(按字母顺序排列),请参阅受支持的 Azure 数据资源管理器群集指标For an alphabetical list of Azure Monitor's metrics for Azure Data Explorers, see supported Azure Data Explorer cluster metrics.

群集指标Cluster metrics

群集指标跟踪群集的常规运行状况。The cluster metrics track the general health of the cluster. 例如,资源和引入的使用及响应情况。For example, resource and ingestion use and responsiveness.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
缓存利用率Cache utilization 百分比Percent Avg、Max、MinAvg, Max, Min 群集当前使用的已分配缓存资源百分比。Percentage of allocated cache resources currently in use by the cluster. 缓存是为用户活动分配的、符合定义的缓存策略的 SSD 大小。Cache is the size of SSD allocated for user activity according to the defined cache policy.

80% 或更低的平均缓存利用率可以维持群集的正常状态。An average cache utilization of 80% or less is a sustainable state for a cluster. 如果平均缓存利用率高于 80%,则应对群集执行以下操作:If the average cache utilization is above 80%, the cluster should be
纵向扩展到存储优化定价层,或者scaled up to a storage optimized pricing tier or
横向扩展到更多实例。scaled out to more instances. 也可以调整缓存策略(减少缓存的天数)。Alternatively, adapt the cache policy (fewer days in cache). 如果缓存利用率超过 100%,则根据缓存策略缓存的数据大小将大于群集上的缓存总大小。If cache utilization is over 100%, the size of data to be cached, according to the caching policy, is larger that the total size of cache on the cluster.
None
CPUCPU 百分比Percent Avg、Max、MinAvg, Max, Min 群集中的计算机当前使用的已分配计算资源百分比。Percentage of allocated compute resources currently in use by machines in the cluster.

80% 或更低的平均 CPU 利用率可以维持群集的正常状态。An average CPU of 80% or less is sustainable for a cluster. 最大 CPU 利用率值为 100%,表示没有更多的计算资源可用于处理数据。The maximum value of CPU is 100%, which means there are no additional compute resources to process data.
如果某个群集的性能不佳,请检查最大 CPU 利用率值,以确定特定的 CPU 是否阻塞。When a cluster isn't performing well, check the maximum value of the CPU to determine if there are specific CPUs that are blocked.
None
引入利用率Ingestion utilization 百分比Percent Avg、Max、MinAvg, Max, Min 用于从容量策略中分配的所有资源引入数据,以执行引入的实际资源百分比。Percentage of actual resources used to ingest data from the total resources allocated, in the capacity policy, to perform ingestion. 默认的容量策略是不超过 512 个并发的引入操作,或者不超过引入中投入的群集资源数的 75%。The default capacity policy is no more than 512 concurrent ingestion operations or 75% of the cluster resources invested in ingestion.

80% 或更低的平均引入利用率可以维持群集的正常状态。Average ingestion utilization of 80% or less is a sustainable state for a cluster. 最大的引入利用率值为 100%,表示使用整个群集引入能力,这可能会生成引入队列。Maximum value of ingestion utilization is 100%, which means all cluster ingestion ability is used and an ingestion queue may result.
None
保持活动状态Keep alive 计数Count AvgAvg 跟踪群集的响应度。Tracks the responsiveness of the cluster.

完全可响应的群集将返回值 1,受阻或断开连接的群集将返回 0。A fully responsive cluster returns value 1 and a blocked or disconnected cluster returns 0.
受限制的命令总数Total number of throttled commands 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 由于达到了允许的最大并发(并行)命令数,而在群集中限制(拒绝)的命令数。The number of throttled (rejected) commands in the cluster, since the maximum allowed number of concurrent (parallel) commands was reached. None
盘区总数Total number of extents 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 群集中的数据盘区总数。Total number of data extents in the cluster.

更改此项指标可能会更改大规模数据的结构并在群集上施加较高的负载,因为合并数据盘区是 CPU 密集型活动。Changes in this metric can imply massive data structure changes and high load on the cluster, since merging data extents is a CPU-heavy activity.
None

导出指标Export metrics

导出指标可跟踪导出操作的常规运行状况和性能,如延迟、结果、记录数和利用率。Export metrics track the general health and performance of export operations like lateness, results, number of records, and utilization.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
连续导出的记录数Continuous export number of exported records 计数Count SumSum 所有连续导出作业中导出的记录数。The number of exported records in all continuous export jobs. None
连续导出最大延迟Continuous export max lateness 计数Count MaxMax 群集中连续导出作业报告的延迟(分钟)。The lateness (in minutes) reported by the continuous export jobs in the cluster. None
连续导出挂起计数Continuous export pending count 计数Count MaxMax 挂起的连续导出作业数。The number of pending continuous export jobs. 这些作业已准备好运行,但可能由于容量不足而在队列中等待。These jobs are ready to run but waiting in a queue, possibly due to insufficient capacity).
连续导出结果Continuous export result 计数Count 计数Count 每个连续导出运行的失败/成功结果。The Failure/Success result of each continuous export run. ContinuousExportNameContinuousExportName
导出利用率Export utilization 百分比Percent MaxMax 已使用的导出容量占群集中总导出容量的百分比(介于 0 和 100 之间)。Export capacity used, out of the total export capacity in the cluster (between 0 and 100). None

引入指标Ingestion metrics

引入指标可跟踪引入操作的常规运行状况和性能,如延迟、结果和数据量。Ingestion metrics track the general health and performance of ingestion operations like latency, results, and volume.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
批处理 Blob 计数Batch blob count 计数Count Avg、Max、MinAvg, Max, Min 引入的已完成批处理中数据源数。Number of data sources in a completed batch for ingestion. 数据库Database
批处理持续时间Batch duration Seconds Avg、Max、MinAvg, Max, Min 引入流中批处理阶段的持续时间The duration of the batching phase in the ingestion flow 数据库Database
批大小Batch size 字节Bytes Avg、Max、MinAvg, Max, Min 引入的聚合批处理中未压缩的预期数据大小。Uncompressed expected data size in an aggregated batch for ingestion. 数据库Database
已处理批处理Batches processed 计数Count Avg、Max、MinAvg, Max, Min 引入的已完成批处理数。Number of batches completed for ingestion. BatchCompletionReason:批处理是否达到批处理策略设置的批处理时间、数据大小或文件数限制。BatchCompletionReason: Whether the batch reached the batching time, data size or number of files limit, set by batching policy. 数据库, BatchCompletionReasonDatabase, BatchCompletionReason
发现延迟Discovery latency Seconds Avg、Max、MinAvg, Max, Min 从数据排队开始到被数据连接发现为止的时间。Time from data enqueue until discovery by data connection. 此时间未包括在 Kusto 总体引入持续时间或 KustoEventAge(引入延迟)中 This time is not included in the Kusto total ingestion duration or the KustoEventAge (Ingestion Latency) 数据库、表、数据连接类型、数据连接名称Database, Table, Data connection type, Data connection name
处理的事件数(适用于事件中心/IoT 中心)Events processed (for Event/IoT Hubs) 计数Count Max、Min、SumMax, Min, Sum 从事件中心读取的以及由群集处理的事件总数Total number of events read from event hubs and processed by the cluster. 事件划分为群集引擎拒绝的事件和接受的事件。The events are split into events rejected and events accepted by the cluster engine. EventStatusEventStatus
引入延迟Ingestion latency Seconds Avg、Max、MinAvg, Max, Min 引入数据的延迟,根据从群集中收到数据,到数据可供查询的时间来测得。Latency of data ingested, from the time the data was received in the cluster until it's ready for query. 引入延迟周期决于引入方案。The ingestion latency period depends on the ingestion scenario. None
引入结果Ingestion result CountCount 计数Count 失败和成功的引入操作总数。Total number of ingestion operations that failed and succeeded.

使用“应用拆分”可以创建成功和失败结果桶,并分析维度( > 状态)。****Use apply splitting to create buckets of success and fail results and analyze the dimensions (Value > Status).
IngestionResultDetailsIngestionResultDetails
引入量 (MB)Ingestion volume (in MB) 计数Count Max、SumMax, Sum 引入到群集中的数据在压缩前的总大小 (MB)。The total size of data ingested to the cluster (in MB) before compression. 数据库Database
阶段延迟Stage latency Seconds Avg、Max、MinAvg, Max, Min 某个特定组件处理这批数据的持续时间。Duration for a particular component to process this batch of data. 一批数据的所有组成部分的总阶段延迟等于这批数据的引入延迟。The total stage latency for all components of a batch of data is equal to its ingestion latency. 数据库、数据连接类型、数据连接名称Database, Data connection type, Data connection name

流引入指标Streaming ingest metrics

流引入指标跟踪流引入数据和请求速率、持续时间与结果。Streaming ingest metrics track streaming ingestion data and request rate, duration, and results.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
流引入数据速率Streaming Ingest Data Rate 计数Count RateRequestsPerSecondRateRequestsPerSecond 引入群集的数据总量。Total volume of data ingested to the cluster. None
流引入持续时间Streaming Ingest Duration 毫秒Milliseconds Avg、Max、MinAvg, Max, Min 所有流引入请求的总持续时间。Total duration of all streaming ingestion requests. None
流引入请求速率Streaming Ingest Request Rate 计数Count Count、Avg、Max、Min、SumCount, Avg, Max, Min, Sum 流引入请求总数。Total number of streaming ingestion requests. None
流引入结果Streaming Ingest Result 计数Count AvgAvg 流引入请求总数,按结果类型列出。Total number of streaming ingestion requests by result type. 结果Result

查询指标Query metrics

查询性能指标跟踪查询持续时间,以及并发或受限制查询的总数。Query performance metrics track query duration and total number of concurrent or throttled queries.

指标Metric 单位Unit 聚合Aggregation 度量值说明Metric description DimensionsDimensions
查询持续时间Query duration 毫秒Milliseconds Avg、Min、Max、SumAvg, Min, Max, Sum 收到查询结果之前所花费的总时间(不包括网络延迟)。Total time until query results are received (doesn't include network latency). QueryStatusQueryStatus
并发查询总数Total number of concurrent queries 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 群集中并行运行的查询数。The number of queries run in parallel in the cluster. 使用此指标可以很好地评估群集上的负载。This metric is a good way to estimate the load on the cluster. None
受限制的查询总数Total number of throttled queries 计数Count Avg、Max、Min、SumAvg, Max, Min, Sum 群集中受限制(被拒绝)的查询数。The number of throttled (rejected) queries in the cluster. 允许的最大并发(并行)查询数在并发查询策略中定义。The maximum number of concurrent (parallel) queries allowed is defined in the concurrent query policy. None

后续步骤Next steps