Configure throttling for Container Insights

Note

Container Insights logs are only throttled when Container Network Logs are being collected. If you have not enabled the collection of Container Network Logs, throttling is not enabled on your cluster.

Azure Monitor - Container Insights allow customers to collect logs generated in their Azure Kubernetes Service (AKS) cluster. Depending on workload and logging configuration, the volume of logs generated can be substantial, leading to throttling and log loss. This article discusses the default values after which logs are throttled in Container Insights. We discuss how customers can modify these values. The final section covers how you can monitor for potential throttling issues with our Quality-of-Service (QoS) Grafana dashboard.

Default Throttling values

Throttling is enabled by default with the following values:

ConfigMap setting Default value Description
throttle_enabled true By default is true and adjust this value to control whether to enable or disable network flow log messages.
throttle_rate 5000 By default is 5000 and range from 1 to 25,000 and adjust this value to control the number of log records during a time window.
throttle_window 300 By default is 300 and adjust this value to control the number of intervals to calculate average over.
throttle_interval 1s By default is 1s and adjust this value to control time interval, expressed in "sleep" format. Examples: 3s, 1.5m, 0.5h etc.
throttle_print false By default is false and adjust this value to control whether to print status messages with current rate and the limits to information logs.

Modifying throttling values

To modify the default values, download ConfigMap and modify below settings to downloaded ConfigMap:

throttle_enabled = true # By default is true and adjust this value to control whether to enable or disable network flow log messages. 
throttle_rate = 5000 # By default is 5000 and range from 1 to 25,000 and adjust this value to control the amount of messages for the time. 
throttle_window = 300 # By default is 300 and adjust this value to control the amount of intervals to calculate average over. 
throttle_interval = "1s" # By default is 1s and adjust this value to control time interval, expressed in "sleep" format. Examples: 3s, 1.5m, 0.5h etc. 
throttle_print = false # By default is false and adjust this value to control whether to print status messages with current rate and the limits to information logs. 

Once you apply the configmap via kubectl apply command, the pods shall get restarted within a few minutes.

kubectl apply -f agent_settings.networkflow_logs_config.yaml

Monitor QoS metrics with Prometheus and Grafana

The Logs add-on that collects Container Network logs publishes QoS metrics that can be used to monitor for throttling and log loss. In this section, we cover how can customers can use Azure Monitor managed service for Prometheus to collect these metrics and then visualize them with Grafana.

Pre-requisites

Configuration steps

  1. Download the ama-metrics-prometheus-config-node ConfigMap
curl -LO https://aka.ms/ama-metrics-prometheus-config-node
  1. Check if you already have an existing ama-metrics-prometheus-config-node ConfigMap via
kubectl get cm -n kube-system | grep ama-metrics-prometheus-config-node

If there's an existing ConfigMap, then you can add the ama-logs-daemonset scrape job to the existing ConfigMap else you can apply this ConfigMap through

kubectl apply -f ama-metrics-prometheus-config-node.yaml 
  1. Import Grafana dashboard JSON file to the Azure Managed Grafana Instance.

  2. Configure the enable_internal_metrics = true in ConfigMap https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml#L220

Apply the configmap with:

kubectl apply -f container-azm-ms-agentconfig.yaml 

Final dashboard

The final QoS dashboard with data flowing is shown in the following image:

Image showing the final result setting up QoS monitoring for Container Insights.