Best practices for monitoring Kubernetes with Azure Monitor

This article provides best practices for monitoring the health and performance of your Azure Kubernetes Service (AKS) and Azure Arc-enabled Kubernetes clusters. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.

Reliability

In the cloud, we acknowledge that failures happen. Instead of trying to prevent failures altogether, the goal is to minimize the effects of a single failing component. Use the following information to best leverage Azure Monitor to ensure the reliability of your Kubernetes clusters and monitoring environment.

Design checklist

  • Enable Container insights for collection of logs and performance data from your cluster.
  • Create diagnostic settings to collect control plane logs for AKS clusters.
  • Ensure the availability of the Log Analytics workspace supporting Container insights.

Configuration recommendations

Recommendation Benefit
Enable Container insights for collection of logs and performance data from your cluster. Container insights collects stdout/stderr logs, performance metrics, and Kubernetes events from each node in your cluster. It provides dashboards and reports for analyzing this data, including the availability of your nodes and other components. Use Log Analytics to identify any availability errors in your collected logs.
Create diagnostic settings to collect control plane logs for AKS clusters. AKS implements control planes logs as resource logs in Azure Monitor. Create a diagnostic setting to send these logs to your Log Analytics workspace so you can use log queries to identify errors and issues affecting availability.
Ensure the availability of the Log Analytics workspace supporting Container insights. Container insights relies on a Log Analytics workspace.

Security

Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to monitor your Kubernetes clusters and ensure that only authorized users access collected data.

Design checklist

  • Use managed identity authentication for your cluster to connect to Container insights.
  • Use traffic analytics to monitor network traffic to and from your cluster.
  • Enable network observability.
  • Ensure the security of the Log Analytics workspace supporting Container insights.

Configuration recommendations

Recommendation Benefit
Use managed identity authentication for your cluster to connect to Container insights. Managed identity authentication is the default for new clusters. If you're using legacy authentication, you should migrate to managed identity to remove the certificate-based local authentication.
Use traffic analytics to monitor network traffic to and from your cluster. Traffic analytics analyzes Azure Network Watcher NSG flow logs to provide insights into traffic flow in your Azure cloud. Use this tool to ensure there's no data exfiltration for your cluster and to detect if any unnecessary public IPs are exposed.
Enable network observability. Network observability add-on for AKS provides observability across the multiple layers in the Kubernetes networking stack. monitor and observe access between services in the cluster.
Ensure the security of the Log Analytics workspace supporting Container insights. Container insights relies on a Log Analytics workspace.

Cost optimization

Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.

Note

See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.

Design checklist

  • Configure agent collection to modify data collection in Container insights.
  • Modify settings for collection of metric data by Container insights.
  • Disable Container insights collection of metric data if you don't use the Container insights experience in the Azure portal.
  • If you don't query the container logs table regularly or use it for alerts, configure it as basic logs.
  • Limit collection of resource logs you don't need.
  • Use resource-specific logging for AKS resource logs and configure tables as basic logs.
  • Use OpenCost to collect details about your Kubernetes costs.

Configuration recommendations

Recommendation Benefit
Configure agent to modify data collection in Container insights. Analyze the data collected by Container insights as described in Controlling ingestion to reduce cost and adjust your configuration to stop collection of data you don't need.
Modify settings for collection of metric data by Container insights. See Enable cost optimization settings for details on modifying both the frequency that metric data is collected and the namespaces that are collected by Container insights.
If you don't query the container logs table regularly or use it for alerts, configure it as basic logs. Convert your Container insights schema to ContainerLogV2 which is compatible with Basic logs and can provide significant cost savings as described in Controlling ingestion to reduce cost.
Limit collection of resource logs you don't need. Control plane logs for AKS clusters are implemented as resource logs in Azure Monitor. Create a diagnostic setting to send this data to a Log Analytics workspace. See Collect control plane logs for AKS clusters for recommendations on which categories you should collect.
Use resource-specific logging for AKS resource logs and configure tables as basic logs. AKS supports either Azure diagnostics mode or resource-specific mode for resource logs. Specify resource logs to enable the option to configure the tables for basic logs, which provide a reduced ingestion charge for logs that you only occasionally query and don't use for alerting.
Use OpenCost to collect details about your Kubernetes costs. OpenCost is an open-source, vendor-neutral CNCF sandbox project for understanding your Kubernetes costs and supporting your ability to for AKS cost visibility. It exports detailed costing data in addition to customer-specific Azure pricing to Azure storage to assist the cluster administrator in analyzing and categorizing costs.

Operational excellence

Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for monitoring your Kubernetes clusters.

Design checklist

  • Review guidance for monitoring all layers of your Kubernetes environment.
  • Use Azure Arc-enabled Kubernetes to monitor your clusters outside of Azure.
  • Integrate AKS clusters into your existing monitoring tools.
  • Use Azure policy to enable data collection from your Kubernetes cluster.

Configuration recommendations

Recommendation Benefit
Review guidance for monitoring all layers of your Kubernetes environment. Monitor your Kubernetes cluster performance with Container insights includes guidance and best practices for monitoring your entire Kubernetes environment from the network, cluster, and application layers.
Use Azure Arc-enabled Kubernetes to monitor your clusters outside of Azure. Azure Arc-enabled Kubernetes allows your Kubernetes clusters running in other clouds to be monitored using the same tools as your AKS clusters, including Container insights.
Integrate AKS clusters into your existing monitoring tools. If you have an existing investment in Prometheus and Grafana, integrate your AKS clusters and Azure managed services into your existing environment using the guidance in Monitor Kubernetes clusters using Azure services and cloud native tools.
Use Azure policy to enable data collection from your Kubernetes cluster. Use Azure Policy to enable data collection for enabling Container insights, and diagnostic settings. This ensures that any new clusters are automatically monitored and enforces their monitoring configuration.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Use the following information to monitor the performance of your Kubernetes clusters and ensure they're configured for maximum performance.

Design checklist

  • Enable Container insights to track performance of your cluster.

Configuration recommendations

Recommendation Benefit
Enable Container insights to track performance of your cluster. When you enable Container insights for your Kubernetes cluster, you can use views and workbooks to track the performance of the components of your cluster. See Cost optimization for recommendations regarding cost.

Next step