使用适用于容器的 Azure Monitor 监视 Kubernetes 群集性能Monitor your Kubernetes cluster performance with Azure Monitor for containers

借助适用于容器的 Azure Monitor,可以使用性能图表和运行状况从两个角度监视托管在 Azure Kubernetes 服务 (AKS)、Azure Stack 或其他环境上的 Kubernetes 群集的工作负荷。With Azure Monitor for containers, you can use the performance charts and health status to monitor the workload of Kubernetes clusters hosted on Azure Kubernetes Service (AKS), Azure Stack, or other environment from two perspectives. 可以直接从群集进行监视,也可以从 Azure Monitor 查看订阅中的所有群集。You can monitor directly from the cluster, or you can view all clusters in a subscription from Azure Monitor. 在监视特定 AKS 群集时,还可以查看 Azure 容器实例。Viewing Azure Container Instances is also possible when monitoring a specific AKS cluster.

本文将帮助你了解这两个角度,并介绍 Azure Monitor 如何帮助你快速评估、调查和解决检测到的问题。This article helps you understand the two perspectives, and how Azure Monitor helps you quickly assess, investigate, and resolve detected issues.

若要了解如何启用适用于容器的 Azure Monitor,请参阅载入适用于容器的 Azure MonitorFor information about how to enable Azure Monitor for containers, see Onboard Azure Monitor for containers.

Azure Monitor 提供一个多群集视图,显示在订阅中跨资源组部署的所有运行 Linux 和 Windows Server 2019 的受监视 Kubernetes 群集的运行状况。Azure Monitor provides a multi-cluster view that shows the health status of all monitored Kubernetes clusters running Linux and Windows Server 2019 deployed across resource groups in your subscriptions. 它显示跨所有环境发现的不受解决方案监视的群集。It shows clusters discovered across all environments that aren't monitored by the solution. 可以即时了解群集运行状况,并且可以从这里向下钻取到节点和控制器性能页,或者进行导航来查看群集的性能图表。You can immediately understand cluster health, and from here, you can drill down to the node and controller performance page or navigate to see performance charts for the cluster. 对于发现的标识为“不受监视”的 AKS 群集,可以随时为该群集启用监视功能。For AKS clusters that were discovered and identified as unmonitored, you can enable monitoring for them at any time.

概述文章中在此处描述了使用适用于容器的 Azure Monitor 监视 Windows Server 群集与监视 Linux 群集的主要差异。The main differences in monitoring a Windows Server cluster with Azure Monitor for containers compared to a Linux cluster are described here in the overview article.

登录到 Azure 门户Sign in to the Azure portal

登录 Azure 门户Sign in to the Azure portal.

从 Azure Monitor 获得的多群集视图Multi-cluster view from Azure Monitor

若要查看已部署的所有 Kubernetes 群集的运行状况,请在 Azure 门户的左窗格中选择“监视”。To view the health status of all Kubernetes clusters deployed, select Monitor from the left pane in the Azure portal. 在“见解”部分,选择“容器”。 Under the Insights section, select Containers.

Azure Monitor 多群集仪表板示例

可以限定网格中显示的结果范围,以显示以下群集:You can scope the results presented in the grid to show clusters that are:

  • Azure - Azure Kubernetes 服务中托管的 AKS 和 AKS 引擎群集Azure - AKS and AKS-Engine clusters hosted in Azure Kubernetes Service
  • Azure Stack(预览版) - Azure Stack 上托管的 AKS 引擎群集Azure Stack (Preview) - AKS-Engine clusters hosted on Azure Stack
  • 非 Azure(预览版) - 本地托管的 Kubernetes 群集Non-Azure (Preview) - Kubernetes clusters hosted on-premises
  • 所有 - 查看 Azure、Azure Stack 和本地环境(加入到适用于容器的 Azure Monitor)中托管的所有 Kubernetes 群集All - View all the Kubernetes clusters hosted in Azure, Azure Stack, and on-premises environments that are onboarded to Azure Monitor for containers

要查看特定环境中的群集,请在页面左上角的“环境”框中选择该环境。To view clusters from a specific environment, select it from the Environments pill on the top-left corner of the page.

“环境”框选择器示例

在“受监视的群集”选项卡上,了解以下情况:On the Monitored clusters tab, you learn the following:

  • 多少群集处于严重或不正常状态,多少群集处于正常或未报告状态(也称未知状态)。How many clusters are in a critical or unhealthy state, versus how many are healthy or not reporting (referred to as an Unknown state).
  • 所有 Azure Kubernetes 引擎(AKS 引擎)部署是否都正常。Whether all of the Azure Kubernetes Engine (AKS-engine) deployments are healthy.
  • 每个群集部署了多少节点、用户和系统 Pod。How many nodes and user and system pods are deployed per cluster.
  • 多少磁盘空间可用,是否有容量问题。How much disk space is available and if there's a capacity issue.

包含的运行状况有:The health statuses included are:

  • 正常:VM 没有检测到任何问题,并且按要求运行。Healthy: No issues are detected for the VM, and it's functioning as required.
  • 严重:检测到一个或多个严重问题,必须解决这些问题才能按预期还原正常操作状态。Critical: One or more critical issues are detected that must be addressed to restore normal operational state as expected.
  • 警告:检测到一个或多个必须解决的问题,不解决这些问题可能会导致运行状况变得严重。Warning: One or more issues are detected that must be addressed or the health condition could become critical.
  • 未知:如果服务无法与节点或 Pod 建立连接,则状态将更改为“未知”状态。Unknown: If the service wasn't able to make a connection with the node or pod, the status changes to an Unknown state.
  • 找不到:工作区、资源组或包含此解决方案的工作区的订阅已删除。Not found: Either the workspace, the resource group, or subscription that contains the workspace for this solution was deleted.
  • 未授权:用户没有读取工作区中的数据所需的权限。Unauthorized: User doesn't have required permissions to read the data in the workspace.
  • 错误:尝试从工作区中读取数据时发生错误。Error: An error occurred while attempting to read data from the workspace.
  • 配置不正确:未在指定工作区中正确配置适用于容器的 Azure Monitor。Misconfigured: Azure Monitor for containers wasn't configured correctly in the specified workspace.
  • 没有数据:在过去 30 分钟内未向工作区报告数据。No data: Data hasn't reported to the workspace for the last 30 minutes.

在进行运行状态计算时,会将这三种状况中“最差”的一种视为群集的总体状况,但存在一种例外情况。Health state calculates overall cluster status as the worst of the three states with one exception. 如果这三种状态中的任何一种为“未知”,则群集总体状态会显示为“未知”。If any of the three states is Unknown, the overall cluster state shows Unknown.

下表提供了计算明细,该计算控制多群集视图中受监视群集的运行状况。The following table provides a breakdown of the calculation that controls the health states for a monitored cluster on the multi-cluster view.

受监视的群集Monitored cluster 状态Status 可用性Availability
用户 PodUser pod
正常Healthy 100%100%
警告Warning 90 - 99%90 - 99%
关键Critical <90%<90%
未知Unknown 如果未在过去 30 分钟报告If not reported in last 30 minutes
系统 PodSystem pod
正常Healthy 100%100%
警告Warning 不适用N/A
关键Critical <100%<100%
未知Unknown 如果未在过去 30 分钟报告If not reported in last 30 minutes
NodeNode
正常Healthy >85%>85%
警告Warning 60 - 84%60 - 84%
关键Critical <60%<60%
未知Unknown 如果未在过去 30 分钟报告If not reported in last 30 minutes

在群集列表中,可以通过选择群集名称向下钻取到“群集”页。From the list of clusters, you can drill down to the Cluster page by selecting the name of the cluster. 然后选择该特定群集的“节点”列中的节点汇总转到“节点”性能页。Then go to the Nodes performance page by selecting the rollup of nodes in the Nodes column for that specific cluster. 或者,可以通过选择“用户 Pod”或“系统 Pod”列的汇总向下钻取到“控制器”性能页。Or, you can drill down to the Controllers performance page by selecting the rollup of the User pods or System pods column.

直接从群集查看性能View performance directly from a cluster

可以直接从 AKS 群集访问适用于容器的 Azure Monitor,方法是:从左窗格中选择“见解” > “群集”,或者从多群集视图中选择一个群集。Access to Azure Monitor for containers is available directly from an AKS cluster by selecting Insights > Cluster from the left pane, or when you selected a cluster from the multi-cluster view. 有关群集的信息组织成四个透视图:Information about your cluster is organized into four perspectives:

  • 群集Cluster
  • NodesNodes
  • 控制器Controllers
  • 容器Containers

备注

本文剩余部分所述体验也适用于查看从多群集视图中选中时托管在 Azure Stack 或其他环境上的 Kubernetes 群集的性能和运行状况。The experience described in the remainder of this article are also applicable for viewing performance and health status of your Kubernetes clusters hosted on Azure Stack or other environment when selected from the multi-cluster view.

默认页会打开并显示四个线形性能图表,这些图表会显示群集的主要性能指标。The default page opens and displays four line performance charts that show key performance metrics of your cluster.

“群集”选项卡上的性能图表示例

性能图表显示四个性能指标:The performance charts display four performance metrics:

  • 节点 CPU 利用率 % :从聚合视角反映整个群集的 CPU 利用率。Node CPU utilization %: An aggregated perspective of CPU utilization for the entire cluster. 若要按时间范围筛选结果,请在图表上方的百分位选择器中选择“Avg”、“Min”、“50th”、“90th”、“95th”或“Max”。To filter the results for the time range, select Avg, Min, 50th, 90th, 95th, or Max in the percentiles selector above the chart. 筛选器可以单独使用,也可以组合使用。The filters can be used either individually or combined.
  • 节点内存利用率 % :提供整个群集的内存利用率的聚合视角。Node memory utilization %: An aggregated perspective of memory utilization for the entire cluster. 若要按时间范围筛选结果,请在图表上方的百分位选择器中选择“Avg”、“Min”、“50th”、“90th”、“95th”或“Max”。To filter the results for the time range, select Avg, Min, 50th, 90th, 95th, or Max in the percentiles selector above the chart. 筛选器可以单独使用,也可以组合使用。The filters can be used either individually or combined.
  • 节点计数:Kubernetes 提供的节点计数和状态。Node count: A node count and status from Kubernetes. 表示的群集节点状态为“总计”、“就绪”和“未就绪”。Statuses of the cluster nodes represented are Total, Ready, and Not Ready. 可以在图表上方的选择器中单独筛选或以组合方式进行筛选。They can be filtered individually or combined in the selector above the chart.
  • 活动 Pod 计数:Kubernetes 提供的 Pod 计数和状态。Active pod count: A pod count and status from Kubernetes. 表示的 Pod 状态为“总计”、“挂起”、“正在运行”、“未知”、“成功”或“失败”。Statuses of the pods represented are Total, Pending, Running, Unknown, Succeeded, or Failed. 可以在图表上方的选择器中单独筛选或以组合方式进行筛选。They can be filtered individually or combined in the selector above the chart.

使用向左和向右箭头键可以循环浏览图表上的每个数据点。Use the Left and Right arrow keys to cycle through each data point on the chart. 使用向上和向下箭头键可以循环浏览百分位行。Use the Up and Down arrow keys to cycle through the percentile lines. 选择其中任一图表右上角的图钉图标会将所选图表固定到你查看的最后一个 Azure 仪表板。Select the pin icon in the upper-right corner of any one of the charts to pin the selected chart to the last Azure dashboard you viewed. 在仪表板中,可以调整图表大小及其位置。From the dashboard, you can resize and reposition the chart. 在仪表板中选择图表会将你重定向到用于容器的 Azure Monitor,并加载正确的范围和视图。Selecting the chart from the dashboard redirects you to Azure Monitor for containers and loads the correct scope and view.

适用于容器的 Azure Monitor 也支持 Azure Monitor 指标资源管理器。在该管理器中,你可以创建自己的绘图图表、将趋势相关联并对其进行调查,以及将内容固定到仪表板。Azure Monitor for containers also supports Azure Monitor metrics explorer, where you can create your own plot charts, correlate and investigate trends, and pin to dashboards. 在指标资源管理器中,还可以使用所设置的条件将指标可视化为基于指标的警报规则的基础。From metrics explorer, you also can use the criteria that you set to visualize your metrics as the basis of a metric-based alert rule.

在指标资源管理器中查看容器指标View container metrics in metrics explorer

在指标资源管理器中,可以通过适用于容器的 Azure Monitor 查看聚合的节点和 Pod 利用率指标。In metrics explorer, you can view aggregated node and pod utilization metrics from Azure Monitor for containers. 下表汇总了详细信息,这些信息有助于你了解如何使用指标图表来可视化容器指标。The following table summarizes the details to help you understand how to use the metric charts to visualize container metrics.

命名空间Namespace 指标Metric 说明Description
insights.container/nodesinsights.container/nodes
cpuUsageMillicorescpuUsageMillicores 整个群集的 CPU 利用率的聚合计量值。Aggregated measurement of CPU utilization across the cluster. 一个 CPU 核心拆分为 1000 个单位(milli 表示 1000)。It is a CPU core split into 1000 units (milli = 1000). 用于确定某个容器中的核心使用率,该容器中可能有许多应用程序使用一个核心。Used to determine the usage of cores in a container where many applications might be using one core.
cpuUsagePercentagecpuUsagePercentage 整个群集的聚合平均 CPU 利用率,以百分比计量。Aggregated average CPU utilization measured in percentage across the cluster.
memoryRssBytesmemoryRssBytes 使用的容器 RSS 内存,以字节为单位。Container RSS memory used in bytes.
memoryRssPercentagememoryRssPercentage 使用的容器 RSS 内存,以百分比表示。Container RSS memory used in percent.
memoryWorkingSetBytesmemoryWorkingSetBytes 使用的容器工作集内存。Container working set memory used.
memoryWorkingSetPercentagememoryWorkingSetPercentage 使用的容器工作集内存,以百分比表示。Container working set memory used in percent.
nodesCountnodesCount Kubernetes 中的节点计数。A node count from Kubernetes.
insights.container/podsinsights.container/pods
PodCountPodCount Kubernetes 中的 Pod 计数。A pod count from Kubernetes.

可以拆分指标,以便按维度来查看它,并以可视化方式表现其片段相互之间的不同之处。You can split a metric to view it by dimension and visualize how different segments of it compare to each other. 对于节点,可以按主机维度将图表分段。For a node, you can segment the chart by the host dimension. 对于 Pod,可按以下维度将其分段:From a pod, you can segment it by the following dimensions:

  • 控制器Controller
  • Kubernetes 命名空间Kubernetes namespace
  • 节点Node
  • 阶段Phase

分析节点、控制器和容器运行状况Analyze nodes, controllers, and container health

切换到“节点”、“控制器”和“容器”选项卡时,页面右侧会自动显示属性窗格 。When you switch to the Nodes, Controllers, and Containers tabs, a property pane automatically displays on the right side of the page. 它显示所选项的属性,包括定义用于组织 Kubernetes 对象的标签。It shows the properties of the item selected, which includes the labels you defined to organize Kubernetes objects. 选择一个 Linux 节点时,“本地磁盘容量”部分还会显示可用磁盘空间以及用于每个提供给节点的磁盘的百分比。When a Linux node is selected, the Local Disk Capacity section also shows the available disk space and the percentage used for each disk presented to the node. 单击窗格中的 >> 链接可查看或隐藏窗格。Select the >> link in the pane to view or hide the pane.

在层次结构中展开对象时,属性窗格将根据所选对象进行更新。As you expand the objects in the hierarchy, the properties pane updates based on the object selected. 在窗格中,还可通过选择窗格顶部的“查看实时数据(预览)”链接,查看 Kubernetes 容器日志 (stdout/stderror)、事件和 Pod 指标 。From the pane, you also can view Kubernetes container logs (stdout/stderror), events, and pod metrics by selecting the View live data (preview) link at the top of the pane. 要详细了解授予和控制查看此数据的访问权限所需的配置,请参阅设置实时数据(预览版)For more information about the configuration required to grant and control access to view this data, see Setup the Live Data (preview). 查看群集资源时,可以实时查看容器中的该数据。While you review cluster resources, you can see this data from the container in real-time. 有关此功能的详细信息,请参阅如何实时查看 Kubernetes 日志、事件和 Pod 指标For more information about this feature, see How to view Kubernetes logs, events, and pod metrics in real time. 要根据预定义的日志搜索查看工作区中存储的 Kubernetes 日志数据,请从“在 Analytics 中查看”下拉列表中选择“查看容器日志”。 To view Kubernetes log data stored in your workspace based on pre-defined log searches, select View container logs from the View in analytics drop-down list. 有关此主题的其他信息,请参阅搜索日志以分析数据For additional information about this topic, see Search logs to analyze data.

使用页面顶部的“+ 添加筛选器”选项可按“服务”、“节点”、“命名空间”或“节点池”筛选视图的结果。 Use the + Add Filter option at the top of the page to filter the results for the view by Service, Node, Namespace, or Node Pool. 选择筛选范围后,选择“选择值”字段中显示的某个值。After you select the filter scope, select one of the values shown in the Select value(s) field. 筛选器在配置后会在用户查看任何视角的 AKS 群集时进行全局应用。After the filter is configured, it's applied globally while viewing any perspective of the AKS cluster. 公式只支持等号。The formula only supports the equal sign. 可以在第一个筛选器的基础上添加更多的筛选器,进一步缩小结果范围。You can add additional filters on top of the first one to further narrow your results. 例如,如果指定了一个按“节点”筛选的筛选器,则只能为第二个筛选器选择“服务”或“命名空间”。 For example, if you specify a filter by Node, you can only select Service or Namespace for the second filter.

在一个选项卡中指定一个筛选器后,如果又选择一个筛选器,则前者会继续应用。Specifying a filter in one tab continues to be applied when you select another. 在选择指定筛选器旁边的 x 符号后,该筛选器会被删除。It's deleted after you select the x symbol next to the specified filter.

切换到“节点”选项卡,行层次结构遵循以群集节点开头的 Kubernetes 对象模型。Switch to the Nodes tab and the row hierarchy follows the Kubernetes object model, which starts with a node in your cluster. 展开节点可查看在节点上运行的一个或多个 Pod。Expand the node to view one or more pods running on the node. 如果将多个容器分组到 Pod 中,它们将在层次结构中的最后一行显示。If more than one container is grouped to a pod, they're displayed as the last row in the hierarchy. 还可查看有多少非 Pod 相关工作负荷在主机上运行(如果主机有处理器或内存使用压力)。You also can view how many non-pod-related workloads are running on the host if the host has processor or memory pressure.

性能视图中的 Kubernetes 节点层次结构示例

在列表中,运行 Windows Server 2019 OS 的 Windows Server 容器显示在所有基于 Linux 的节点后面。Windows Server containers that run the Windows Server 2019 OS are shown after all of the Linux-based nodes in the list. 展开 Windows Server 节点时,可以查看在节点上运行的一个或多个 Pod 和容器。When you expand a Windows Server node, you can view one or more pods and containers that run on the node. 选择节点后,属性窗格会显示版本信息。After a node is selected, the properties pane shows version information.

列出了 Windows Server 节点的示例节点层次结构

运行 Linux OS 的 Azure 容器实例虚拟节点显示在列表中最后一个 AKS 群集节点之后。Azure Container Instances virtual nodes that run the Linux OS are shown after the last AKS cluster node in the list. 展开容器实例虚拟节点时,可以查看在节点上运行的一个或多个容器实例 Pod 和容器。When you expand a Container Instances virtual node, you can view one or more Container Instances pods and containers that run on the node. 不会为节点(只为 Pod)收集和报告指标。Metrics aren't collected and reported for nodes, only for pods.

列出了容器实例的示例节点层次结构

从展开的节点中,你可以从在节点上运行的 pod 或容器向下钻取到控制器来查看针对该控制器筛选的性能数据。From an expanded node, you can drill down from the pod or container that runs on the node to the controller to view performance data filtered for that controller. 选择特定节点的“控制器”列下的值。Select the value under the Controller column for the specific node.

性能视图中从节点到控制器的示例向下钻取

从页面顶部选择控制器或容器,查看这些对象的状态和资源使用率。Select controllers or containers at the top of the page to review the status and resource utilization for those objects. 若要查看内存利用率,可在“指标”下拉列表中选择“内存 RSS”或“内存工作集” 。To review memory utilization, in the Metric drop-down list, select Memory RSS or Memory working set. 仅 Kubernetes 1.8 版和更高版本支持内存 RSSMemory RSS is supported only for Kubernetes version 1.8 and later. 否则,看到的 Min % 值会显示为 NaN % ,它表示未定义或无法表示的值的数值数据类型值。Otherwise, you view values for Min % as NaN %, which is a numeric data type value that represents an undefined or unrepresentable value.

容器节点性能视图

“内存工作集”显示包含的驻留内存和虚拟内存(缓存),是应用程序正在使用的内存的总和。Memory working set shows both the resident memory and virtual memory (cache) included and is a total of what the application is using. “内存 RSS”只显示主内存(即,只显示驻留内存,而不显示任何其他内存)。Memory RSS shows only main memory (which is nothing but the resident memory in other words). 此指标显示可用内存的实际容量。This metric shows the actual capacity of available memory. 驻留内存与虚拟内存之间有何差别?What is the difference between resident memory and virtual memory?

  • 驻留内存(也称为主内存)是可用于群集节点的实际计算机内存量。Resident memory or main memory, is the actual amount of machine memory available to the nodes of the cluster.

  • 虚拟内存是保留的硬盘空间(缓存),操作系统在遇到内存压力时,会使用这些空间将内存中的数据交换到磁盘,并在需要时将数据提回到内存。Virtual memory is reserved hard disk space (cache) used by the operating system to swap data from memory to disk when under memory pressure, and then fetch it back to memory when needed.

默认情况下,性能数据基于过去六个小时的数据,但可以使用左上角的“时间范围”选项更改时间窗口。By default, performance data is based on the last six hours, but you can change the window by using the TimeRange option at the upper left. 还可以在百分位选择器中选择“Min”、“Avg”、“50th”、“90th”、“95th”和“Max”,在时间范围内筛选结果。 You also can filter the results within the time range by selecting Min, Avg, 50th, 90th, 95th, and Max in the percentile selector.

用于数据筛选的百分位选择

当鼠标悬停在“趋势”列下的条形图上方时,每一条都显示 15 分钟示例期间内的 CPU 或内存使用情况(具体取决于所选指标)。When you hover over the bar graph under the Trend column, each bar shows either CPU or memory usage, depending on which metric is selected, within a sample period of 15 minutes. 通过键盘选择趋势图后,使用 Alt+PageUp 或 Alt+PageDown 键单独循环浏览每个条形。After you select the trend chart through a keyboard, use the Alt+Page up key or Alt+Page down key to cycle through each bar individually. 显示的详细信息与将鼠标悬停在该条形上相同。You get the same details that you would if you hovered over the bar.

趋势条形图悬停示例

在下一个示例中,对于列表中的第一节点 aks-nodepool1- ,其“容器”的值为 9。In the next example, for the first node in the list, aks-nodepool1-, the value for Containers is 9. 此值表示部署的容器汇总总数。This value is a rollup of the total number of containers deployed.

单节点容器数汇总示例

此信息有助于快速确定群集中节点间的容器是否适当均衡。This information can help you quickly identify whether you have a proper balance of containers between nodes in your cluster.

下表描述了查看“节点”选项卡时显示的信息。The information that's presented when you view the Nodes tab is described in the following table.

Column 描述Description
名称Name 主机的名称。The name of the host.
状态Status 节点状态的 Kubernetes 视图。Kubernetes view of the node status.
Min %、Avg %、50th %、90th %、95th %、Max %Min %, Avg %, 50th %, 90th %, 95th %, Max % 基于所选时段百分位的平均节点百分比。Average node percentage based on percentile during the selected duration.
Min、Avg、50th、90th、95th、MaxMin, Avg, 50th, 90th, 95th, Max 基于所选时段的平均节点实际值。Average nodes' actual value based on percentile during the time duration selected. 平均值根据为节点设置的 CPU/内存限制进行计算。The average value is measured from the CPU/Memory limit set for a node. 对于 Pod 和容器,平均值为主机报告的平均值。For pods and containers, it's the average value reported by the host.
容器Containers 容器数量。Number of containers.
运行时间Uptime 表示节点启动或重启后的时间。Represents the time since a node started or was rebooted.
控制器Controller 仅适用于容器和 Pod。Only for containers and pods. 它显示驻留的控制器。It shows which controller it resides in. 并非所有 Pod 都在控制器中,因此有些 Pod 可能会显示 N/ANot all pods are in a controller, so some might display N/A.
趋势 Min %、Avg %、50th %、90th %、95th %、Max %Trend Min %, Avg %, 50th %, 90th %, 95th %, Max % 条形图趋势表示控制器的平均百分位指标百分比。Bar graph trend represents the average percentile metric percentage of the controller.

在展开名为“其他进程”的节点后,你可能会注意到一个工作负载。You may notice a workload after expanding a node named Other process. 它表示在节点上运行的非容器化进程,包括:It represents non-containerized processes that run on your node, and includes:

  • 自行托管的或托管 Kubernetes 非容器化进程Self-managed or managed Kubernetes non-containerized processes

  • 容器运行时进程Container run-time processes

  • KubeletKubelet

  • 节点上运行的系统进程System processes running on your node

  • 节点硬件或 VM 上运行的其他非 Kubernetes 工作负载Other non-Kubernetes workloads running on node hardware or VM

其计算方法为:CAdvisor 中的总用量 - 容器化进程的用量 。It is calculated by: Total usage from CAdvisor - Usage from containerized process.

在选择器中,选择“控制器”。In the selector, select Controllers.

选择“控制器”视图

可以在此处查看控制器和容器实例虚拟节点控制器或未连接到控制器的虚拟节点 Pod 的性能运行状况。Here you can view the performance health of your controllers and Container Instances virtual node controllers or virtual node pods not connected to a controller.

<名称> 控制器性能视图

行层次结构以控制器开始。The row hierarchy starts with a controller. 展开控制器时,可以查看一个或多个 Pod。When you expand a controller, you view one or more pods. 展开 Pod,最后一行显示分组到 Pod 的容器。Expand a pod, and the last row displays the container grouped to the pod. 从展开的控制器中,你可以向下钻取到运行它的节点来查看针对该节点筛选的性能数据。From an expanded controller, you can drill down to the node it's running on to view performance data filtered for that node. 未连接到控制器的容器实例 Pod 在列表中最后列出。Container Instances pods not connected to a controller are listed last in the list.

列出了容器实例 Pod 的示例控制器层次结构

选择特定控制器的“节点”列下的值。Select the value under the Node column for the specific controller.

性能视图中从节点到控制器的示例向下钻取

下表描述了查看控制器时显示的信息:The information that's displayed when you view controllers is described in the following table.

Column 描述Description
名称Name 控制器的名称。The name of the controller.
状态Status 容器完成运行并处于“正常”、“已终止”、“已失败”、“已停止”或“已暂停”等状态时的汇总状态 。The rollup status of the containers after it's finished running with status such as OK, Terminated, Failed, Stopped, or Paused. 如果容器仍在运行,但是状态未正确显示或者未被代理选择并且超出 30 分钟后仍未响应,则状态为“未知”。If the container is running but the status either wasn't properly displayed or wasn't picked up by the agent and hasn't responded for more than 30 minutes, the status is Unknown. 下表提供了状态图标的更多详细信息。Additional details of the status icon are provided in the following table.
Min %、Avg %、50th %、90th %、95th %、Max %Min %, Avg %, 50th %, 90th %, 95th %, Max % 每个实体在选定指标和百分位的平均百分比的汇总平均值。Rollup average of the average percentage of each entity for the selected metric and percentile.
Min、Avg、50th、90th、95th、MaxMin, Avg, 50th, 90th, 95th, Max 容器在选定百分位的平均 CPU millicore 或内存性能汇总。Rollup of the average CPU millicore or memory performance of the container for the selected percentile. 平均值根据为 Pod 设置的 CPU/内存限制进行计算。The average value is measured from the CPU/Memory limit set for a pod.
容器Containers 控制器或 Pod 的容器总数。Total number of containers for the controller or pod.
重启数Restarts 容器重启计数汇总。Rollup of the restart count from containers.
运行时间Uptime 表示容器启动后的时间。Represents the time since a container started.
节点Node 仅适用于容器和 Pod。Only for containers and pods. 它显示驻留的控制器。It shows which controller it resides in.
趋势 Min %、Avg %、50th %、90th %、95th %、Max %Trend Min %, Avg %, 50th %, 90th %, 95th %, Max % 条形图趋势表示控制器的平均百分位指标。Bar graph trend represents the average percentile metric of the controller.

状态字段中的图标指示容器的联机状态。The icons in the status field indicate the online status of the containers.

图标Icon 状态Status
“就绪”运行状态图标 正在运行(就绪)Running (Ready)
“正在等待”或“已暂停”状态图标 “正在等待”或“已暂停”Waiting or Paused
“上次报告正在运行”状态图标 上次报告正在运行但已超过 30 分钟未响应Last reported running but hasn't responded for more than 30 minutes
成功状态图标 成功停止或无法停止Successfully stopped or failed to stop

状态图标显示的计数基于 Pod 提供的数据。The status icon displays a count based on what the pod provides. 它显示最差的两个状态。将鼠标悬停在状态上时,它显示容器中所有 Pod 的汇总状态。It shows the worst two states, and when you hover over the status, it displays a rollup status from all pods in the container. 如果没有就绪状态,状态值会显示 (0)If there isn't a ready state, the status value displays (0).

在选择器中,选择“容器”。In the selector, select Containers.

选择“容器”视图

在此处可查看 Azure Kubernetes 和 Azure 容器实例容器的性能运行状况。Here you can view the performance health of your Azure Kubernetes and Azure Container Instances containers.

<名称> 容器性能视图

从容器中,可以向下钻取到某个 pod 或节点来查看针对该对象筛选的性能数据。From a container, you can drill down to a pod or node to view performance data filtered for that object. 选择特定容器的“Pod”或“节点”列下的值。 Select the value under the Pod or Node column for the specific container.

性能视图中从节点到容器器的示例向下钻取

下表描述了查看容器时显示的信息。The information that's displayed when you view containers is described in the following table.

Column 描述Description
名称Name 控制器的名称。The name of the controller.
状态Status 容器状态(如果有)。Status of the containers, if any. 接下来的表格提供状态图标的更多详细信息。Additional details of the status icon are provided in the next table.
Min %、Avg %、50th %、90th %、95th %、Max %Min %, Avg %, 50th %, 90th %, 95th %, Max % 每个实体在选定指标和百分位的平均百分比汇总。The rollup of the average percentage of each entity for the selected metric and percentile.
Min、Avg、50th、90th、95th、MaxMin, Avg, 50th, 90th, 95th, Max 容器在选定百分位的平均 CPU millicore 或内存性能汇总。The rollup of the average CPU millicore or memory performance of the container for the selected percentile. 平均值根据为 Pod 设置的 CPU/内存限制进行计算。The average value is measured from the CPU/Memory limit set for a pod.
PodPod Pod 驻留的容器。Container where the pod resides.
节点Node 容器驻留的节点。Node where the container resides.
重启数Restarts 表示容器启动后的时间。Represents the time since a container started.
运行时间Uptime 表示容器启动或重启后的时间。Represents the time since a container was started or rebooted.
趋势 Min %、Avg %、50th %、90th %、95th %、Max %Trend Min %, Avg %, 50th %, 90th %, 95th %, Max % 条形图趋势表示容器的平均百分位指标百分比。Bar graph trend represents the average percentile metric percentage of the container.

状态字段中的图标指示 Pod 的联机状态,如下表所述。The icons in the status field indicate the online statuses of pods, as described in the following table.

图标Icon 状态Status
“就绪”运行状态图标 正在运行(就绪)Running (Ready)
“正在等待”或“已暂停”状态图标 “正在等待”或“已暂停”Waiting or Paused
“上次报告正在运行”状态图标 上次报告正在运行但已超过 30 分钟未响应Last reported running but hasn't responded in more than 30 minutes
“已终止”状态图标 成功停止或无法停止Successfully stopped or failed to stop
“已失败”状态图标 “已失败”状态Failed state

工作簿Workbooks

工作簿可将文本、 日志查询指标和参数合并到丰富的交互式报表中。Workbooks combine text, log queries, metrics, and parameters into rich interactive reports. 有权访问相同 Azure 资源的其他团队成员都可编辑工作簿。Workbooks are editable by any other team members who have access to the same Azure resources.

用于容器的 Azure Monitor 包含四个用于入门的工作簿:Azure Monitor for containers includes four workbooks to get you started:

  • 磁盘容量:在容器中为每个提供给节点的磁盘提供包含以下方面内容的交互式磁盘使用情况图表:Disk capacity: Presents interactive disk usage charts for each disk presented to the node within a container by the following perspectives:

    • 所有磁盘的磁盘使用率百分比。Disk percent usage for all disks.
    • 所有磁盘的可用磁盘空间。Free disk space for all disks.
    • 一个网格,显示每个节点磁盘的已使用空间百分比、已使用空间趋势百分比、可用磁盘空间 (GiB),以及可用磁盘空间趋势 (GiB)。A grid that shows each node's disk, its percentage of used space, trend of percentage of used space, free disk space (GiB), and trend of free disk space (GiB). 选择表中的某个行时,会在该行下面显示已使用空间百分比和可用磁盘空间 (GiB)。When a row is selected in the table, the percentage of used space and free disk space (GiB) is shown underneath the row.
  • 磁盘 IO:在容器中为每个提供给节点的磁盘提供包含以下方面内容的交互式磁盘使用率图表:Disk IO: Presents interactive disk utilization charts for each disk presented to the node within a container by the following perspectives:

    • 跨所有磁盘按读取字节数/秒、写入字节数/秒以及读取和写入字节数/秒趋势汇总的磁盘 I/O。Disk I/O summarized across all disks by read bytes/sec, writes bytes/sec, and read and write bytes/sec trends.
    • 八个显示关键性能指标的性能图表,用于度量和标识磁盘 I/O 瓶颈。Eight performance charts show key performance indicators to help measure and identify disk I/O bottlenecks.
  • Kubelet:包括两个显示关键节点操作统计信息的网格:Kubelet: Includes two grids that show key node operating statistics:

    • 节点网格的概览汇总了每个节点的总操作数、总错误数、成功的操作数(按百分比),以及趋势。Overview by node grid summarizes total operation, total errors, and successful operations by percent and trend for each node.
    • 操作类型概览汇总了每个操作的总操作数、总错误数、成功的操作数(按百分比),以及趋势。Overview by operation type summarizes for each operation the total operation, total errors, and successful operations by percent and trend.
  • 网络:提供每个节点网络适配器的交互式网络使用率图表,并提供一个表示关键性能指标的网格,用于度量网络适配器的性能。Network: Presents interactive network utilization charts for each node's network adapter, and a grid presents the key performance indicators to help measure the performance of your network adapters.

访问这些工作簿的方法是从“查看工作簿”下拉列表中选择每个工作簿。You access these workbooks by selecting each one from the View Workbooks drop-down list.

“查看工作簿”下拉列表

后续步骤Next steps