Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article contains all the monitoring reference information for this service.
See Monitor Azure Kubernetes Service (AKS) for details on the data you can collect for AKS and how to use it.
This section lists all the automatically collected platform metrics for this service.
For information on metric retention, see Azure Monitor Metrics overview.
The following metrics are allow-listed with minimalingestionprofile=true
for default ON targets. The below metrics are collected by default, as these targets are scraped by default.
controlplane-apiserver:
apiserver_request_total
apiserver_cache_list_fetched_objects_total
apiserver_cache_list_returned_objects_total
apiserver_flowcontrol_demand_seats_average
apiserver_flowcontrol_current_limit_seats
apiserver_request_sli_duration_seconds_bucket
apiserver_request_sli_duration_seconds_sum
apiserver_request_sli_duration_seconds_count
process_start_time_seconds
apiserver_request_duration_seconds_bucket
apiserver_request_duration_seconds_sum
apiserver_request_duration_seconds_count
apiserver_storage_list_fetched_objects_total
apiserver_storage_list_returned_objects_total
apiserver_current_inflight_requests
Note
apiserver_request_sli_duration_seconds_bucket
and apiserver_request_duration_seconds_bucket
are not collected now with a recent release. These are high cardinality metrics which may increase the number of metrics stored based on the number of custom resources in the cluster. If you would like to collect these bucket metrics, you can add it to the keep list. We highly recommend not turning off the minimal ingestion profile for the control plane components
controlplane-etcd:
etcd_server_has_leader
rest_client_requests_total
etcd_mvcc_db_total_size_in_bytes
etcd_mvcc_db_total_size_in_use_in_bytes
etcd_server_slow_read_indexes_total
etcd_server_slow_apply_total
etcd_network_client_grpc_sent_bytes_total
etcd_server_heartbeat_send_failures_total
The following are metrics that are allow-listed with minimalingestionprofile=true
for default OFF targets. These metrics aren't collected by default. You can turn ON scraping for these targets using default-scrape-settings-enabled.<target-name>=true
using the ama-metrics-settings-configmap under the default-scrape-settings-enabled
section.
controlplane-kube-controller-manager:
workqueue_depth
rest_client_requests_total
rest_client_request_duration_seconds
controlplane-kube-scheduler:
scheduler_pending_pods
scheduler_unschedulable_pods
scheduler_queue_incoming_pods_total
scheduler_schedule_attempts_total
scheduler_preemption_attempts_total
controlplane-cluster-autoscaler:
rest_client_requests_total
cluster_autoscaler_last_activity
cluster_autoscaler_cluster_safe_to_autoscale
cluster_autoscaler_failed_scale_ups_total
cluster_autoscaler_scale_down_in_cooldown
cluster_autoscaler_scaled_up_nodes_total
cluster_autoscaler_unneeded_nodes_count
cluster_autoscaler_unschedulable_pods_count
cluster_autoscaler_nodes_count
cloudprovider_azure_api_request_errors
cloudprovider_azure_api_request_duration_seconds_bucket
cloudprovider_azure_api_request_duration_seconds_count
Note
The CPU and memory usage metrics for all control-plane targets are not exposed irrespective of the profile.
For information about what metric dimensions are, see Multi-dimensional metrics.
This service has the following dimensions associated with its metrics.
Dimension Name | Description |
---|---|
requestKind | Used by metrics such as Inflight Requests to split by type of request. |
condition | Used by metrics such as Statuses for various node conditions, Number of pods in Ready state to split by condition type. |
status | Used by metrics such as Statuses for various node conditions to split by status of the condition. |
status2 | Used by metrics such as Statuses for various node conditions to split by status of the condition. |
node | Used by metrics such as CPU Usage Millicores to split by the name of the node. |
phase | Used by metrics such as Number of pods by phase to split by the phase of the pod. |
namespace | Used by metrics such as Number of pods by phase to split by the namespace of the pod. |
pod | Used by metrics such as Number of pods by phase to split by the name of the pod. |
nodepool | Used by metrics such as Disk Used Bytes to split by the name of the nodepool. |
device | Used by metrics such as Disk Used Bytes to split by the name of the device. |
3gppGen | Used by metrics such as Number of Active PDU Sessions. |
Cause | Used by metrics such as User plane packet drop rate. |
Direction | Used by metrics such as User plane bandwidth. |
Dnn | Used by metrics such as PDU session establishment attempts rate. |
Interface | Used by metrics such as User plane bandwidth. |
LUN | Used by metrics such as Percentage of data disk bandwidth consumed. |
PccpId | Used by metrics such as Number of Active PDU Sessions. |
Result | Used by metrics such as Authentication failure rate. |
SiteId | Used by metrics such as Number of Active PDU Sessions. |
Tai | Used by metrics such as Service request failure rate. |
VMName | Used by metrics such as Amount of physical memory. |
This section lists the types of resource logs you can collect for this service. The section pulls from the list of all resource logs category types supported in Azure Monitor.
The following table lists a few example operations related to AKS that might be created in the Activity log. Use the Activity log to track information such as when a cluster is created or had its configuration change. You can view this information in the portal or by using other methods. You can also use it to create an Activity log alert to be proactively notified when an event occurs.
Operation | Description |
---|---|
Microsoft.ContainerService/managedClusters/write | Create or update managed cluster |
Microsoft.ContainerService/managedClusters/delete | Delete Managed Cluster |
Microsoft.ContainerService/managedClusters/listClusterMonitoringUserCredential/action | List clusterMonitoringUser credential |
Microsoft.ContainerService/managedClusters/listClusterAdminCredential/action | List clusterAdmin credential |
Microsoft.ContainerService/managedClusters/agentpools/write | Create or Update Agent Pool |
- See Monitor Azure Kubernetes Service for a description of monitoring AKS.
- See Monitor Azure resources with Azure Monitor for details on monitoring Azure resources.