如何实时查看指标How to view metrics in real-time

借助容器见解实时数据(预览版)功能,你可以实时可视化群集中有关节点和 Pod 状态的指标。Container insights Live Data (preview) feature allows you to visualize metrics about node and pod state in a cluster in real-time. 它模拟对 kubectl top nodeskubectl get pods -all-namespaceskubectl get nodes 命令的直接访问,以调用、分析和可视化此见解中包含的性能图表中的数据。It emulates direct access to the kubectl top nodes, kubectl get pods -all-namespaces, and kubectl get nodes commands to call, parse, and visualize the data in performance charts that are included with this Insight.

本文详细介绍了此功能,并帮助你了解如何使用此功能。This article provides a detailed overview and helps you understand how to use this feature.

备注

此功能不支持以专用群集形式启用的 AKS 群集。AKS clusters enabled as private clusters are not supported with this feature. 此功能依赖于从浏览器通过代理服务器直接访问 Kubernetes API。This feature relies on directly accessing the Kubernetes API through a proxy server from your browser. 启用网络安全以阻止来自此代理的 Kubernetes API 阻止此流量。Enabling networking security to block the Kubernetes API from this proxy will block this traffic.

有关设置实时数据(预览版)功能或对其进行故障排除的帮助,请参阅安装指南For help with setting up or troubleshooting the Live Data (preview) feature, review our setup guide.

工作方式How it Works

实时数据(预览版)功能可直接访问 Kubernetes API;有关身份验证模型的其他信息,请参阅此处The Live Data (preview) feature directly access the Kubernetes API, and additional information about the authentication model can be found here.

该功能对指标终结点(包括 /api/v1/nodes/apis/metrics.k8s.io/v1beta1/nodes/api/v1/pods)执行轮询操作,默认每 5 秒轮询一次。This feature performs a polling operation against the metrics endpoints (including /api/v1/nodes, /apis/metrics.k8s.io/v1beta1/nodes, and /api/v1/pods), which is every five seconds by default. 通过对“实时”复选框选择“开”,将这些数据缓存在你的浏览器中,并绘制在“群集”选项卡上“容器见解”中包含的四个性能图表中。This data is cached in your browser and charted in the four performance charts included in Container insights on the Cluster tab by selecting On for the Live checkbox. 每个后续轮询都将绘制到一个滚动显示五分钟的可视化窗口中。Each subsequent poll is charted into a rolling five-minute visualization window.

“群集”视图中的“上线”选项

轮询间隔在“设置间隔”下拉列表中配置,允许你每 1、5、15 和 30 秒为新数据设置一次轮询。The polling interval is configured from the Set interval drop-down allowing you to set polling for new data every 1, 5, 15 and 30 seconds.

“上线”下拉轮询间隔

重要

建议将轮询间隔设置为 1 秒,并在较短的时间内排除问题。We recommend setting the polling interval to one second while troubleshooting an issue for a short period of time. 这些请求可能会影响群集上 Kubernetes API 的可用性和限制。These requests may impact the availability and throttling of the Kubernetes API on your cluster. 然后,重新配置为更长的轮询间隔。Afterwards, reconfigure to a longer polling interval.

重要

此功能运行期间不会永久存储任何数据。No data is stored permanently during operation of this feature. 当你关闭浏览器或退出此功能时,在此会话期间捕获的所有信息将立即删除。All information captured during this session is immediately deleted when you close your browser or navigate away from the feature. 数据只在一个显示五分钟的窗口内以可视化效果呈现;任何超过五分钟的指标也将永久删除。Data only remains present for visualization inside the five minute window; any metrics older than five minutes are also permanently deleted.

无法将这些图表固定到在实时模式下查看的最后一个 Azure 仪表板。These charts cannot be pinned to the last Azure dashboard you viewed in live mode.

指标捕获Metrics captured

节点 CPU 利用率/节点内存利用率Node CPU utilization % / Node Memory utilization %

这两个性能图表分别对应于调用 kubectl top nodes 并将 CPU% 和 MEMORY% 列的结果捕获到相应的图表 。These two performance charts map to an equivalent of invoking kubectl top nodes and capturing the results of the CPU% and MEMORY% columns to the respective chart.

Kubectl top nodes 示例结果

节点 CPU 利用率百分比图表

节点内存利用率百分比图表

百分位数计算在较大的群集中很有用,可帮助识别群集中的异常节点。The percentile calculations will function in larger clusters to help identify outlier nodes in your cluster. 例如,了解节点是否未充分利用,以便纵向缩减。For example, to understand if nodes are under-utilized for scale down purposes. 使用 Min 聚合,可以看到群集中哪些节点的利用率较低。Utilizing the Min aggregation you can see which nodes have low utilization in the cluster. 要进一步调查,请选择“节点”选项卡,并按 CPU 或内存利用率对网格进行排序。To further investigate, you select the Nodes tab and sort the grid by CPU or memory utilization.

这也有助于你了解哪些节点即将到达极限以及是否需要横向扩展。This also helps you understand which nodes are being pushed to their limits and if scale-out may be required. 同时使用 Max 和 P95 聚合可以帮助你查看群集中是否存在具有高资源利用率的节点 。Utilizing both the Max and P95 aggregations can help you see if there are nodes in the cluster with high resource utilization. 为了进一步调查,你将再次切换到“节点”选项卡。For further investigation, you would again switch to the Nodes tab.

节点计数Node count

此性能图表对应于 kubectl get nodes 并将“状态”列映射到按状态类型分组的图表。This performance chart maps to an equivalent of invoking kubectl get nodes and mapping the STATUS column to a chart grouped by status types.

Kubectl get nodes 示例结果

节点计数图表

节点报告为“就绪”或“未就绪”状态 。Nodes are reported either in a Ready or Not Ready state. 将对这些节点进行计数(并创建总计数),这两个聚合的结果将绘制到图表中。They are counted (and a total count is created), and the results of these two aggregations are charted. 例如,了解节点是否处于失败状态。For example, to understand if your nodes are falling into failed states. 利用“未就绪”聚合,你可以快速看到群集中当前处于“未就绪”状态的节点数 。Utilizing the Not Ready aggregation you can quickly see the number of nodes in your cluster currently in the Not Ready state.

活动 Pod 计数Active pod count

此性能图表对应于调用 kubectl get pods -all-namespaces 并将“状态”列映射到按状态类型分组的图表。This performance chart maps to an equivalent of invoking kubectl get pods -all-namespaces and maps the STATUS column the chart grouped by status types.

Kubectl get pods 示例结果

节点 pod 计数图表

备注

kubectl 解释的状态名称可能与图表中不完全一致。Names of status as interpreted by kubectl may not exactly match in the chart.

后续步骤Next steps

查看日志查询示例,了解预定义的查询和示例,以创建警报、呈现可视化效果或对群集执行进一步分析。View log query examples to see predefined queries and examples to create alerts, visualizations, or perform further analysis of your clusters.