Enable monitoring for Azure Kubernetes Service (AKS) clusters

As described in Kubernetes monitoring in Azure Monitor, multiple features of Azure Monitor work together to provide complete monitoring of your Azure Kubernetes Service (AKS) clusters. This article describes how to enable the following features for AKS clusters:

Prometheus metrics
Managed Grafana
Container logging
Control plane logs

Prerequisites

You need at least Contributor access to the cluster for onboarding.
To link an Azure Monitor Workspace with an existing Managed Grafana workspace as part of onboarding, you need either Owner access or at least Contributor and User Access Administrator roles.
You need Monitoring Reader or Monitoring Contributor roles to view data after monitoring is enabled.

Important

If your clusters connect to the Azure Monitor workspace or Log Analytics workspace by using Azure private link, see Enable private link for monitoring virtual machines and Kubernetes clusters in Azure Monitor.

Create workspaces

The following table describes the workspaces that are required to support the Azure Monitor features enabled in this article. If you don't already have an existing workspace of each type, you can create them as part of the onboarding process. See Design a Log Analytics workspace architecture for guidance on how many workspaces to create and where they should be placed.

Feature	Workspace	Notes
Managed Prometheus	Azure Monitor workspace	If you don't specify an existing Azure Monitor workspace when onboarding, the default workspace for the resource group is used. If a default workspace doesn't already exist in the cluster's region, one with a name in the format `DefaultAzureMonitorWorkspace-<mapped_region>` is created in a resource group with the name `DefaultRG-<cluster_region>`. `Contributor` permission is enough for enabling the addon to send data to the Azure Monitor workspace. You need `Owner` level permission to link your Azure Monitor Workspace to view metrics in Azure Managed Grafana. This is required because the user executing the onboarding step, needs to be able to give the Azure Managed Grafana System Identity `Monitoring Reader` role on the Azure Monitor Workspace to query the metrics.
Container logging Control plane logs Container insights	Log Analytics workspace	You can attach a cluster to a Log Analytics workspace in a different Azure subscription in the same Microsoft Entra tenant, but you must use the Azure CLI or an Azure Resource Manager template. You can't currently perform this configuration with the Azure portal. If you're connecting an existing cluster to a Log Analytics workspace in another subscription, the Microsoft.ContainerService resource provider must be registered in the subscription with the Log Analytics workspace. For more information, see Register resource provider. If you don't specify an existing Log Analytics workspace, the default workspace for the resource group is used. If a default workspace doesn't already exist in the cluster's region, one is created with a name in the format `DefaultWorkspace-<GUID>-<Region>`.
Managed Grafana	Azure Managed Grafana workspace	Link your Grafana workspace to your Azure Monitor workspace to make the Prometheus metrics collected from your cluster available to Grafana dashboards.

Prometheus metrics and container insights

When you enable Prometheus and container insights on a cluster, a containerized version of the Azure Monitor agent is installed in the cluster. You can configure these features at the same time on a new or existing cluster, or enable each feature separately.

Enable Managed Grafana for your cluster at the same time that you enable scraping of Prometheus metrics. See Link a Grafana workspace for options to connect your Azure Monitor workspace and Azure Managed Grafana workspace.

Prerequisites

The cluster must use managed identity authentication.
The following resource providers must be registered in the subscription of the cluster and the Azure Monitor workspace:
- Microsoft.ContainerService
- Microsoft.Insights
- Microsoft.AlertsManagement
- Microsoft.Monitor
The following resource providers must be registered in the subscription of the Grafana workspace subscription:
- Microsoft.Dashboard
Managed identity authentication is default in CLI version 2.49.0 or higher.
The aks-preview extension must be uninstalled from AKS clusters using the command az extension remove --name aks-preview.

The cluster must use managed identity authentication.
The following resource providers must be registered in the subscription of the cluster and the Azure Monitor workspace:
- Microsoft.ContainerService
- Microsoft.Insights
- Microsoft.AlertsManagement
- Microsoft.Monitor
The following resource providers must be registered in the subscription of the Grafana workspace subscription:
- Microsoft.Dashboard
The Azure Monitor workspace and Azure Managed Grafana instance must already be created.
The template must be deployed in the same resource group as the Azure Managed Grafana instance.
If the Azure Managed Grafana instance is in a subscription other than the Azure Monitor workspace subscription, register the Azure Monitor workspace subscription with the Microsoft.Dashboard resource provider using the guidance at Register resource provider.
Users with the User Access Administrator role in the subscription of the AKS cluster can enable the Monitoring Reader role directly by deploying the template.

Note

Currently in Bicep, there's no way to explicitly scope the Monitoring Reader role assignment on a string parameter "resource ID" for an Azure Monitor workspace like in an ARM template. Bicep expects a value of type resource | tenant. There's also no REST API spec for an Azure Monitor workspace.

Therefore, the default scoping for the Monitoring Reader role is on the resource group. The role is applied on the same Azure Monitor workspace by inheritance, which is the expected behavior. After you deploy this Bicep template, the Grafana instance is given Monitoring Reader permissions for all the Azure Monitor workspaces in that resource group.

The cluster must use managed identity authentication.
The following resource providers must be registered in the subscription of the cluster and the Azure Monitor workspace:
- Microsoft.ContainerService
- Microsoft.Insights
- Microsoft.AlertsManagement
- Microsoft.Monitor
The following resource providers must be registered in the subscription of the Grafana workspace subscription:
- Microsoft.Dashboard
The Azure Monitor workspace and Azure Managed Grafana workspace must already be created.
The template needs to be deployed in the same resource group as the Azure Managed Grafana workspace.
Users with the User Access Administrator role in the subscription of the AKS cluster can enable the Monitoring Reader role directly by deploying the template.
If the Azure Managed Grafana instance is in a subscription other than the Azure Monitor Workspaces subscription, register the Azure Monitor Workspace subscription with the Microsoft.Dashboard resource provider by following this documentation.

Enable Prometheus metrics on an AKS cluster

Enable Prometheus metrics on an AKS cluster using the --enable-azure-monitor-metrics option with the az aks create command for a new cluster or the az aks update command for an existing cluster. This option uses the configuration described in Default Prometheus metrics configuration in Azure Monitor. To modify this configuration, see Customize scraping of Prometheus metrics in Azure Monitor managed service for Prometheus.

Example commands:

### Use default Azure Monitor workspace
az aks create/update --enable-azure-monitor-metrics --name <cluster-name> --resource-group <cluster-resource-group>

### Use existing Azure Monitor workspace
az aks create/update --enable-azure-monitor-metrics --name <cluster-name> --resource-group <cluster-resource-group> --azure-monitor-workspace-resource-id <workspace-name-resource-id>

### Use an existing Azure Monitor workspace and link with an existing Grafana workspace
az aks create/update --enable-azure-monitor-metrics --name <cluster-name> --resource-group <cluster-resource-group> --azure-monitor-workspace-resource-id <azure-monitor-workspace-name-resource-id> --grafana-resource-id  <grafana-workspace-name-resource-id>

### Use optional parameters
az aks create/update --enable-azure-monitor-metrics --name <cluster-name> --resource-group <cluster-resource-group> --ksm-metric-labels-allow-list "namespaces=[k8s-label-1,k8s-label-n]" --ksm-metric-annotations-allow-list "pods=[k8s-annotation-1,k8s-annotation-n]"

Optional parameters

The example commands allow the following optional parameters. The parameter name is different for each, but their use is the same.

Parameter	Name and description
Annotation keys	`--ksm-metric-annotations-allow-list` Comma-separated list of Kubernetes annotations keys used in the resource's `kube_resource_annotations` metric. For example, kube_pod_annotations is the annotations metric for the pods resource. By default, this metric contains only name and namespace labels. To include more annotations, provide a list of resource names in their plural form and Kubernetes annotation keys that you want to allow for them. A single `*` can be provided for each resource to allow any annotations, but this has severe performance implications. For example, `pods=[kubernetes.io/team,...],namespaces=[kubernetes.io/team],...`.
Label keys	`--ksm-metric-labels-allow-list` Comma-separated list of more Kubernetes label keys that is used in the resource's kube_resource_labels metric kube_resource_labels metric. For example, kube_pod_labels is the labels metric for the pods resource. By default this metric contains only name and namespace labels. To include more labels, provide a list of resource names in their plural form and Kubernetes label keys that you want to allow for them A single `*` can be provided for each resource to allow any labels, but i this has severe performance implications. For example, `pods=[app],namespaces=[k8s-label-1,k8s-label-n,...],...`.
Recording rules	`--enable-windows-recording-rules` Lets you enable the recording rule groups required for proper functioning of the Windows dashboards.

Note

The parameters set using --ksm-metric-annotations-allow-list and --ksm-metric-labels-allow-list can be overridden or alternatively set using the ama-metrics-settings-configmap.

Retrieve required values for Grafana resource

If the Azure Managed Grafana instance is already linked to an Azure Monitor workspace, then you must include this list in the template or it will be overwritten. On the Overview page for the Azure Managed Grafana instance in the Azure portal, select JSON view, and copy the value of azureMonitorWorkspaceIntegrations which looks similar to the following sample. If it doesn't exist, then the instance hasn't been linked with any Azure Monitor workspace.

"properties": {
    "grafanaIntegrations": {
        "azureMonitorWorkspaceIntegrations": [
            {
                "azureMonitorWorkspaceResourceId": "full_resource_id_1"
            },
            {
                "azureMonitorWorkspaceResourceId": "full_resource_id_2"
            }
        ]
    }
}

Download and edit template and parameter file

Download the required files.

Bicep
- Template file: https://aka.ms/azureprometheus-enable-bicep-template
- Parameter file: https://aka.ms/azureprometheus-enable-bicep-template-parameters
- DCRA module: https://aka.ms/nested_azuremonitormetrics_dcra_clusterResourceId
- Profile module: https://aka.ms/nested_azuremonitormetrics_profile_clusterResourceId
- Azure Managed Grafana Role Assignment module: https://aka.ms/nested_grafana_amw_role_assignment
JSON
- Template file: https://aka.ms/azureprometheus-enable-arm-template
- Parameter file: https://aka.ms/azureprometheus-enable-arm-template-parameters

Edit the following values in the parameter file. The same set of values are used for both the ARM and Bicep templates. Retrieve the resource ID of the resources from the JSON View of their Overview page.

Parameter	Value
`azureMonitorWorkspaceResourceId`	Resource ID for the Azure Monitor workspace. Retrieve from the JSON view on the Overview page for the Azure Monitor workspace.
`azureMonitorWorkspaceLocation`	Location of the Azure Monitor workspace. Retrieve from the JSON view on the Overview page for the Azure Monitor workspace.
`clusterResourceId`	Resource ID for the AKS cluster. Retrieve from the JSON view on the Overview page for the cluster.
`clusterLocation`	Location of the AKS cluster. Retrieve from the JSON view on the Overview page for the cluster.
`metricLabelsAllowlist`	Comma-separated list of Kubernetes labels keys to be used in the resource's labels metric.
`metricAnnotationsAllowList`	Comma-separated list of more Kubernetes label keys to be used in the resource's annotations metric.
`grafanaResourceId`	Resource ID for the managed Grafana instance. Retrieve from the JSON view on the Overview page for the Grafana instance.
`grafanaLocation`	Location for the managed Grafana instance. Retrieve from the JSON view on the Overview page for the Grafana instance.
`grafanaSku`	SKU for the managed Grafana instance. Retrieve from the JSON view on the Overview page for the Grafana instance. Use the sku.name.

Open the template file and update the grafanaIntegrations property at the end of the file with the values that you retrieved from the Grafana instance. This looks similar to the following samples. In these samples, full_resource_id_1 and full_resource_id_2 were already in the Azure Managed Grafana resource JSON. The final azureMonitorWorkspaceResourceId entry is already in the template and is used to link to the Azure Monitor workspace resource ID provided in the parameters file.

Bicep

    resource grafanaResourceId_8 'Microsoft.Dashboard/grafana@2022-08-01' = {
        name: split(grafanaResourceId, '/')[8]
        sku: {
            name: grafanaSku
        }
        identity: {
            type: 'SystemAssigned'
        }
        location: grafanaLocation
        properties: {
            grafanaIntegrations: {
                azureMonitorWorkspaceIntegrations: [
                    {
                        azureMonitorWorkspaceResourceId: 'full_resource_id_1'
                    }
                    {
                        azureMonitorWorkspaceResourceId: 'full_resource_id_2'
                    }
                    {
                        azureMonitorWorkspaceResourceId: azureMonitorWorkspaceResourceId
                    }
                ]
            }
        }
    }

JSON

{
    "type": "Microsoft.Dashboard/grafana",
    "apiVersion": "2022-08-01",
    "name": "[split(parameters('grafanaResourceId'),'/')[8]]",
    "sku": {
        "name": "[parameters('grafanaSku')]"
    },
    "location": "[parameters('grafanaLocation')]",
    "properties": {
        "grafanaIntegrations": {
        "azureMonitorWorkspaceIntegrations": [
            {
                "azureMonitorWorkspaceResourceId": "full_resource_id_1"
            },
            {
                "azureMonitorWorkspaceResourceId": "full_resource_id_2"
            },
            {
                "azureMonitorWorkspaceResourceId": "[parameters('azureMonitorWorkspaceResourceId')]"
            }
        ]
        }
    }
}

Deploy the template with the parameter file by using any valid method for deploying Resource Manager templates. For examples of different methods, see Deploy the sample templates.

Retrieve required values for Grafana resource

"properties": {
    "grafanaIntegrations": {
        "azureMonitorWorkspaceIntegrations": [
            {
                "azureMonitorWorkspaceResourceId": "full_resource_id_1"
            },
            {
                "azureMonitorWorkspaceResourceId": "full_resource_id_2"
            }
        ]
    }
}

Update the azure_monitor_workspace_integrations block in main.tf with the list of Grafana integrations.

  azure_monitor_workspace_integrations {
    resource_id  = var.monitor_workspace_id[var.monitor_workspace_id1, var.monitor_workspace_id2]
  }

Download and edit templates

New AKS cluster

Download all files under AddonTerraformTemplate.
Edit the variables in variables.tf file with the correct parameter values.
Run terraform init -upgrade to initialize the Terraform deployment.
Run terraform plan -out main.tfplan to initialize the Terraform deployment.
Run terraform apply main.tfplan to apply the execution plan to your cloud infrastructure.

Pass the variables for annotations_allowed and labels_allowed keys in main.tf only when those values exist. These blocks are optional.

Note

Edit the main.tf file appropriately before running the terraform template. Add in any existing azure_monitor_workspace_integrations values to the grafana resource before running the template. Else, older values get deleted and replaced with what is there in the template during deployment. Users with 'User Access Administrator' role in the subscription of the AKS cluster can enable 'Monitoring Reader' role directly by deploying the template. Edit the grafanaSku parameter if you're using a nonstandard SKU and finally run this template in the Grafana Resource's resource group.

You can enable Prometheus metrics and container logs when you create a new AKS cluster or on an existing cluster in the Azure portal. In both cases, the configuration experience is the same.

New AKS cluster

When you create a new AKS cluster in the Azure portal, configure monitoring in the Monitoring tab.

Existing cluster

Navigate to your cluster in the Azure portal. In the service menu, select Monitor and then Monitor Settings.

Configuration options

Configuration options are the same for both new and existing clusters. The only difference is you might need to select Advanced settings to view all options for an existing cluster.

Prometheus metrics, Grafana, and Container Logs and events are selected for you. If you have existing Azure Monitor workspace, Grafana workspace and Log Analytics workspace, then they're selected for you. Select Advanced settings if you want to select alternate workspaces or create new ones.

For container logs, you must select a logging profile, which defines which logs are collected and at what frequency. The available profiles are listed in the following table.

Cost preset	Collection frequency	Namespace filters	Syslog collection	Collected data
Logs and Events (Default)	1m	None	Not enabled	ContainerLogV2 KubeEvents KubePodInventory
Syslog	1m	None	Enabled by default	All standard container insights tables
Standard	1m	None	Not enabled	All standard container insights tables
Cost-optimized	5m	Excludes kube-system, gatekeeper-system, azure-arc	Not enabled	All standard container insights tables

If you want to customize the settings, select Edit collection settings. Each of these settings is described in the following table.

Name	Description
Collection frequency	Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m. This option can't be configured through the ConfigMap.
Namespace filtering	Off: Collects data on all namespaces. Include: Collects only data from the values in the namespaces field. Exclude: Collects data from all namespaces except for the values in the namespaces field. Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the namespaceFilteringMode. For example, namespaces = ["kube-system", "default"] with an Include setting collects only these two namespaces. With an Exclude setting, the agent collects data from all other namespaces except for kube-system and default.
Collected Data	Defines which Container insights tables to collect. See the next table for a description of each grouping.
Enable ContainerLogV2	Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are ingested to ContainerLogV2 table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2. Unchecking this setting does not disable data collection. It only specifies which table collected data is sent to.
Enable Syslog collection	Enables Syslog collection from the cluster.

The Collected data option allows you to select the tables that are populated for the cluster. The tables are grouped by the most common scenarios.

Grouping	Tables	Notes
All (Default)	All standard container insights tables	Required for enabling the default Container insights visualizations
Performance	Perf, InsightsMetrics
Logs and events	ContainerLog or ContainerLogV2, KubeEvents, KubePodInventory	Recommended if you have enabled managed Prometheus metrics
Workloads, Deployments, and HPAs	InsightsMetrics, KubePodInventory, KubeEvents, ContainerInventory, ContainerNodeInventory, KubeNodeInventory, KubeServices
Persistent Volumes	InsightsMetrics, KubePVInventory

Download Azure Policy template and parameter files.
- Template file: https://aka.ms/AddonPolicyMetricsProfile
- Parameter file: https://aka.ms/AddonPolicyMetricsProfile.parameters

Create the policy definition using the following CLI command:

az policy definition create --name "Prometheus Metrics addon" --display-name "Prometheus Metrics addon" --mode Indexed --metadata version=1.0.0 category=Kubernetes --rules AddonPolicyMetricsProfile.rules.json --params AddonPolicyMetricsProfile.parameters.json`

After you create the policy definition, in the Azure portal, select Policy and then Definitions. Select the policy definition you created.
Select Assign and fill in the details on the Parameters tab. Select Review + Create.
If you want to apply the policy to an existing cluster, create a Remediation task for that cluster resource from Policy Assignment.

After the policy is assigned to the subscription, whenever you create a new cluster without Prometheus enabled, the policy will run and deploy to enable Prometheus monitoring.

Enable container insights and logging on an AKS cluster

Enable container insights and container logging on an AKS cluster using the --addon monitoring option with the az aks create command for a new cluster or the az aks enable-addon command to update an existing cluster.

Example commands:

### Use default Log Analytics workspace
az aks enable-addons --addon monitoring --name <cluster-name> --resource-group <cluster-resource-group-name>

### Use existing Log Analytics workspace
az aks enable-addons --addon monitoring --name <cluster-name> --resource-group <cluster-resource-group-name> --workspace-resource-id <workspace-resource-id>

### Use custom log configuration file
az aks enable-addons --addon monitoring --name <cluster-name> --resource-group <cluster-resource-group-name> --workspace-resource-id <workspace-resource-id> --data-collection-settings dataCollectionSettings.json

Log configuration file

To customize log collection settings for the cluster, you can provide the configuration as a JSON file using the following format. If you don't provide a configuration file, the default settings identified in the table are used.

{
  "interval": "1m",
  "namespaceFilteringMode": "Include",
  "namespaces": ["kube-system"],
  "enableContainerLogV2": true, 
  "streams": ["Microsoft-Perf", "Microsoft-ContainerLogV2"]
}

The following table describes each of the settings in the log configuration file:

Name	Description
`interval`	Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals If the value is outside the allowed range, then it defaults to 1m. Default: 1m.
`namespaceFilteringMode`	Include: Collects only data from the values in the namespaces field. Exclude: Collects data from all namespaces except for the values in the namespaces field. Off: Ignores any namespace selections and collect data on all namespaces. Default: Off
`namespaces`	Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the namespaceFilteringMode. For example, namespaces = ["kube-system", "default"] with an Include setting collects only these two namespaces. With an Exclude setting, the agent collects data from all other namespaces except for kube-system and default. With an Off setting, the agent collects data from all namespaces including kube-system and default. Invalid and unrecognized namespaces are ignored. None.
`enableContainerLogV2`	Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are ingested to ContainerLogV2 table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2. Default: True
`streams`	An array of table streams to collect. See Stream values for a list of the valid streams and their corresponding tables. Default: Microsoft-ContainerInsights-Group-Default

Important

The template must be deployed in the same resource group as the cluster.

Download and install template

Download and edit template and parameter file.

Bicep
- Template file (Syslog): https://aka.ms/enable-monitoring-msi-syslog-bicep-template
- Parameter file (Syslog): https://aka.ms/enable-monitoring-msi-syslog-bicep-parameters
- Template file (No Syslog): https://aka.ms/enable-monitoring-msi-bicep-template
- Parameter file (No Syslog): https://aka.ms/enable-monitoring-msi-bicep-parameters
ARM
- Template file: https://aka.ms/aks-enable-monitoring-msi-onboarding-template-file
- Parameter file: https://aka.ms/aks-enable-monitoring-msi-onboarding-template-parameter-file

Parameter	Description
`aksResourceId`	Resource ID of the cluster.
`aksResourceLocation`	Location of the cluster.
`workspaceResourceId`	Resource ID of the Log Analytics workspace.
`resourceTagValues`	Tag values specified for the existing Container insights extension data collection rule (DCR) of the cluster and the name of the DCR. The name will be `MSCI-<clusterName>-<clusterRegion>` and this resource created in an AKS clusters resource group. For first time onboarding, you can set arbitrary tag values.
`enableRetinaNetworkFlowLogs`	Flag to indicate whether to enable Retina Network Flow Logs.
`enableContainerLogV2`	Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are sent to ContainerLogV2 table. If not, the container logs are sent to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2.
`enableSyslog`	Specifies whether Syslog collection should be enabled.
`syslogLevels`	If Syslog collection is enabled, it specifies the log levels to collect.
`dataCollectionInterval`	Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m. If the value is outside the allowed range, then it defaults to 1m.
`namespaceFilteringModeForDataCollection`	Include: Collects only data from the values in the namespaces field. Exclude: Collects data from all namespaces except for the values in the namespaces field. Off: Ignores any namespace selections and collects data on all namespaces.
`namespacesForDataCollection`	Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the namespaceFilteringMode. For example, namespaces = ["kube-system", "default"] with an Include setting collects only these two namespaces. With an Exclude setting, the agent collects data from all other namespaces except for kube-system and default. With an Off setting, the agent collects data from all namespaces including kube-system and default. Invalid and unrecognized namespaces are ignored.
`streams`	An array of table streams. See Stream values for a list of the valid streams and their corresponding tables.
`useAzureMonitorPrivateLinkScope`	Specifies whether to use private link for the cluster connection to Azure Monitor.
`azureMonitorPrivateLinkScopeResourceId`	If private link is used, resource ID of the private link scope.

Deploy the template with the parameter file by using any valid method for deploying Resource Manager templates. For examples of different methods, see Deploy the sample templates.

New AKS cluster

Download Terraform template file depending on whether you want to enable Syslog collection.
- Syslog: https://aka.ms/enable-monitoring-msi-syslog-terraform
- No Syslog: https://aka.ms/enable-monitoring-msi-terraform
Adjust the azurerm_kubernetes_cluster resource in main.tf based on your cluster settings.

Update parameters in variables.tf to replace values in "<>":

Parameter	Description
`aks_resource_group_name`	Use the values on the AKS Overview page for the resource group.
`resource_group_location`	Use the values on the AKS Overview page for the resource group.
`cluster_name`	Define the cluster name that you would like to create.
`workspace_resource_id`	Use the resource ID of your Log Analytics workspace.
`workspace_region`	Use the location of your Log Analytics workspace.
`resource_tag_values`	Match the existing tag values specified for the existing Container insights extension data collection rule (DCR) of the cluster and the name of the DCR. The name matches `MSCI-<clusterName>-<clusterRegion>` and this resource is created in the same resource group as the AKS clusters. For first time onboarding, you can set the arbitrary tag values.
`enabledContainerLogV2`	Set this parameter value to be true to use the default recommended ContainerLogV2.
Cost optimization parameters	Refer to Data collection parameters
`streams`	Streams for data collection. See Stream values.
`use_azure_monitor_private_link_scope`	Flag to indicate whether to configure Azure Monitor Private Link Scope.
`azure_monitor_private_link_scope_resource_id`	Azure Resource ID of the Azure Monitor Private Link Scope.

Run terraform init -upgrade to initialize the Terraform deployment.
Run terraform plan -out main.tfplan to initialize the Terraform deployment.
Run terraform apply main.tfplan to apply the execution plan to your cloud infrastructure.

Existing AKS cluster

Import the existing cluster resource first with the command: terraform import azurerm_kubernetes_cluster.k8s <aksResourceId>.

Add the oms_agent add-on profile to the existing azurerm_kubernetes_cluster resource.

oms_agent {
    log_analytics_workspace_id = var.workspace_resource_id
    msi_auth_for_monitoring_enabled = true
  }

Copy the DCR and DCRA resources from the Terraform templates.
Run terraform plan -out main.tfplan and make sure the change is adding the oms_agent property. Note: If the azurerm_kubernetes_cluster resource defined is different during terraform plan, the existing cluster gets destroyed and recreated.
Run terraform apply main.tfplan to apply the execution plan to your cloud infrastructure.

Tip

Edit the main.tf file appropriately before running the terraform template.
Data will start flowing after 10 minutes since the cluster needs to be ready first.
WorkspaceID needs to match the format /subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/example-resource-group/providers/Microsoft.OperationalInsights/workspaces/workspaceValue.
If resource group already exists, run terraform import azurerm_resource_group.rg /subscriptions/<Subscription_ID>/resourceGroups/<Resource_Group_Name> before terraform plan.

You can enable Prometheus metrics and container logs when you create a new AKS cluster or on an existing cluster in the Azure portal. In both cases, the configuration experience is the same.

New AKS cluster

When you create a new AKS cluster in the Azure portal, configure monitoring in the Monitoring tab.

Existing cluster

Navigate to your cluster in the Azure portal. In the service menu, select Monitor and then Monitor Settings.

Configuration options

Configuration options are the same for both new and existing clusters. The only difference is you might need to select Advanced settings to view all options for an existing cluster.

For container logs, you must select a logging profile, which defines which logs are collected and at what frequency. The available profiles are listed in the following table.

Cost preset	Collection frequency	Namespace filters	Syslog collection	Collected data
Logs and Events (Default)	1m	None	Not enabled	ContainerLogV2 KubeEvents KubePodInventory
Syslog	1m	None	Enabled by default	All standard container insights tables
Standard	1m	None	Not enabled	All standard container insights tables
Cost-optimized	5m	Excludes kube-system, gatekeeper-system, azure-arc	Not enabled	All standard container insights tables

If you want to customize the settings, select Edit collection settings. Each of these settings is described in the following table.

Name	Description
Collection frequency	Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m. This option can't be configured through the ConfigMap.
Namespace filtering	Off: Collects data on all namespaces. Include: Collects only data from the values in the namespaces field. Exclude: Collects data from all namespaces except for the values in the namespaces field. Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the namespaceFilteringMode. For example, namespaces = ["kube-system", "default"] with an Include setting collects only these two namespaces. With an Exclude setting, the agent collects data from all other namespaces except for kube-system and default.
Collected Data	Defines which Container insights tables to collect. See the next table for a description of each grouping.
Enable ContainerLogV2	Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are ingested to ContainerLogV2 table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2.
Enable Syslog collection	Enables Syslog collection from the cluster.

The Collected data option allows you to select the tables that are populated for the cluster. The tables are grouped by the most common scenarios.

Grouping	Tables	Notes
All (Default)	All standard container insights tables	Required for enabling the default Container insights visualizations
Performance	Perf, InsightsMetrics
Logs and events	ContainerLog or ContainerLogV2, KubeEvents, KubePodInventory	Recommended if you have enabled managed Prometheus metrics
Workloads, Deployments, and HPAs	InsightsMetrics, KubePodInventory, KubeEvents, ContainerInventory, ContainerNodeInventory, KubeNodeInventory, KubeServices
Persistent Volumes	InsightsMetrics, KubePVInventory

Download Azure Policy template and parameter files.
- Template file: https://aka.ms/enable-monitoring-msi-azure-policy-template
- Parameter file: https://aka.ms/enable-monitoring-msi-azure-policy-parameters

Create the policy definition using the following CLI command:

az policy definition create --name "AKS-Monitoring-Addon-MSI" --display-name "AKS-Monitoring-Addon-MSI" --mode Indexed --metadata version=1.0.0 category=Kubernetes --rules azure-policy.rules.json --params azure-policy.parameters.json

After you create the policy definition, in the Azure portal, select Policy and then Definitions. Select the policy definition you created.
Select Assign and fill in the details on the Parameters tab. Select Review + Create.
If you want to apply the policy to an existing cluster, create a Remediation task for that cluster resource from Policy Assignment.

Stream values

When you specify the tables to collect using CLI or BICEP/ARM, you specify stream names that correspond to particular tables in the Log Analytics workspace. The following table lists the stream names and their corresponding table.

Note

If you're familiar with the structure of a data collection rule, the stream names in this table are specified in the Data flows section of the DCR.

Stream	Container insights table
Microsoft-ContainerInventory	ContainerInventory
Microsoft-ContainerLog	ContainerLog
Microsoft-ContainerLogV2	ContainerLogV2
Microsoft-ContainerLogV2-HighScale	ContainerLogV2 (High scale mode)¹
Microsoft-ContainerNodeInventory	ContainerNodeInventory
Microsoft-InsightsMetrics	InsightsMetrics
Microsoft-KubeEvents	KubeEvents
Microsoft-KubeMonAgentEvents	KubeMonAgentEvents
Microsoft-KubeNodeInventory	KubeNodeInventory
Microsoft-KubePodInventory	KubePodInventory
Microsoft-KubePVInventory	KubePVInventory
Microsoft-KubeServices	KubeServices
Microsoft-Perf	Perf
Microsoft-ContainerInsights-Group-Default	Group stream that includes all of the above streams.²

¹ Don't use both Microsoft-ContainerLogV2 and Microsoft-ContainerLogV2-HighScale together. This will result in duplicate data. ² Use the group stream as a shorthand to specifying all the individual streams. If you want to collect a specific set of streams then specify each stream individually instead of using the group stream.

Applicable tables and metrics

The settings for collection frequency and namespace filtering don't apply to all log data. The following tables list the tables in the Log Analytics workspace along with the settings that apply to each.

Table name	Interval?	Namespaces?	Remarks
ContainerInventory	Yes	Yes
ContainerNodeInventory	Yes	No	Data collection setting for namespaces isn't applicable since Kubernetes Node isn't a namespace scoped resource
KubeNodeInventory	Yes	No	Data collection setting for namespaces isn't applicable Kubernetes Node isn't a namespace scoped resource
KubePodInventory	Yes	Yes
KubePVInventory	Yes	Yes
KubeServices	Yes	Yes
KubeEvents	No	Yes	Data collection setting for interval isn't applicable for the Kubernetes Events
Perf	Yes	Yes	Data collection setting for namespaces isn't applicable for the Kubernetes Node related metrics since the Kubernetes Node isn't a namespace scoped object.
InsightsMetrics	Yes	Yes	Data collection settings are only applicable for the metrics collecting the following namespaces: `container.azm.ms/kubestate`, `container.azm.ms/pv`, and `container.azm.ms/gpu`.

Note

Namespace filtering doesn't apply to ama-logs agent records. As a result, even if the kube-system namespace is listed among excluded namespaces, records associated to ama-logs agent container are still ingested.

Metric namespace	Interval?	Namespaces?	Remarks
Insights.container/nodes	Yes	No	Node isn't a namespace scoped resource
Insights.container/pods	Yes	Yes
Insights.container/containers	Yes	Yes
Insights.container/persistentvolumes	Yes	Yes

Special scenarios

Use the following resources for configuration requirements for particular scenarios:

If you're using private link, see Enable private link for Kubernetes monitoring in Azure Monitor.
To enable high scale mode, follow the onboarding process at Enable high scale mode for Monitoring add-on. You must also update the ConfigMap as described in Update ConfigMap, and the DCR stream needs to be changed from Microsoft-ContainerLogV2 to Microsoft-ContainerLogV2-HighScale.

Enable control plane logs on an AKS cluster

Control plane logs are implemented as resource logs in Azure Monitor. To collect these logs, create a diagnostic setting for the cluster. Send them to the same Log Analytics workspace as your container logs.

Use the az monitor diagnostic-settings create command to create a diagnostic setting with the Azure CLI. See the documentation for this command for descriptions of its parameters.

The following example creates a diagnostic setting that sends all Kubernetes categories to a Log Analytics workspace. This includes resource-specific mode to send the logs to specific tables listed in Supported resource logs for Microsoft.ContainerService/fleets.

az monitor diagnostic-settings create \
--name 'Collect control plane logs' \
--resource  /subscriptions/<subscription ID>/resourceGroups/<resource group name>/providers/Microsoft.ContainerService/managedClusters/<cluster-name> \
--workspace /subscriptions/<subscription ID>/resourcegroups/<resource group name>/providers/microsoft.operationalinsights/workspaces/<log analytics workspace name> \
--logs '[{"category": "karpenter-events","enabled": true},{"category": "kube-audit","enabled": true},
{"category": "kube-apiserver","enabled": true},{"category": "kube-audit-admin","enabled": true},{"category": "kube-controller-manager","enabled": true},{"category": "kube-scheduler","enabled": true},{"category": "cluster-autoscaler","enabled": true},{"category": "cloud-controller-manager","enabled": true},{"category": "guard","enabled": true},{"category": "csi-azuredisk-controller","enabled": true},{"category": "csi-azurefile-controller","enabled": true},{"category": "csi-snapshot-controller","enabled": true},{"category": "fleet-member-agent","enabled": true},{"category": "fleet-member-net-controller-manager","enabled": true},{"category": "fleet-mcs-controller-manager","enabled": true}]'
--metrics '[{"category": "AllMetrics","enabled": true}]' \
--export-to-resource-specific true

Following are sample template and parameter files to create a diagnostic setting for control plane logs. Modify the templates to collect different categories or to send the logs to a different destination.

Bicep

param clusterName string
param workspaceId string
param settingName string

resource cluster 'Microsoft.ContainerService/managedClusters@2021-05-01-preview' existing = {
  name: clusterName
}

resource setting 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: settingName
  scope: cluster
  properties: {
    workspaceId: workspaceId
    logs: [
      {
        category: 'kube-apiserver'
        enabled: true
      }
      {
        category: 'kube-audit'
        enabled: true
      }
      {
        category: 'kube-audit-admin'
        enabled: true
      }
      {
        category: 'kube-controller-manager'
        enabled: true
      }
      {
        category: 'kube-scheduler'
              }
      {
        category: 'cluster-autoscaler'
        enabled: true
      }
      {
        category: 'guard'
        enabled: true
      }
    ]
  }
}

JSON

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "clusterName": {
            "type": "String"
        },
        "workspaceId": {
            "type": "String"
        },
        "settingName": {
            "type": "String"
        }
    },
    "resources": [
        {
            "type": "Microsoft.Insights/diagnosticSettings",
            "apiVersion": "2021-05-01-preview",
            "scope": "[format('Microsoft.ContainerService/managedClusters/{0}', parameters('clusterName'))]",
            "name": "[parameters('settingName')]",
            "properties": {
                "workspaceId": "[parameters('workspaceId')]",
                "logs": [
                    {
                        "category": "kube-apiserver",
                        "enabled": true
                    },
                    {
                        "category": "kube-audit",
                        "enabled": true
                    },
                    {
                        "category": "kube-audit-admin",
                        "enabled": true
                    },
                    {
                        "category": "kube-controller-manager",
                        "enabled": true
                    },
                    {
                        "category": "kube-scheduler",
                        "enabled": false
                    },
                    {
                        "category": "cluster-autoscaler",
                        "enabled": true
                    },
                    {
                        "category": "guard",
                        "enabled": true
                    }
                ]
            }
        }
    ]
}

Parameter file

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "settingName": {
        "value": "<cluster-name>"
    },
    "workspaceId": {
      "value": "/subscriptions/<subscription id>/resourcegroups/<resourcegroup name>/providers/microsoft.operationalinsights/workspaces/<workspace name>"
    },
    "scope": {
      "value": "Microsoft.<resource type>/<resourceName>"
    }
  }
}

Use the following template to create a diagnostic setting for control plane logs. Modify the template to collect different categories or to send the logs to a different destination.

  features {}
}

variable "setting_name" {
  type        = string
  description = "Name for the diagnostic setting."
}

variable "workspace_id" {
  type        = string
  description = "Resource ID of the Log Analytics workspace."
}

variable "cluster_id" {
  type        = string
  description = "Resource ID of the AKS cluster to attach diagnostics to."
}

resource "azurerm_monitor_diagnostic_setting" "aks" {
  name                       = var.setting_name
  target_resource_id         = var.cluster_id
  log_analytics_workspace_id = var.workspace_id

  log {
    category = "kube-apiserver"
    enabled  = true
  }

  log {
    category = "kube-audit"
    enabled  = true
  }

  log {
    category = "kube-audit-admin"
    enabled  = true
  }

  log {
    category = "kube-controller-manager"
    enabled  = true
  }

  log {
    category = "kube-scheduler"
    enabled  = false
  }

  log {
    category = "cluster-autoscaler"
    enabled  = true
  }

  log {
    category = "guard"
    enabled  = true
  }
}

Enable Windows metrics (Preview)

Windows metric collection is enabled for AKS clusters as of version 6.4.0-main-02-22-2023-3ee44b9e of the Managed Prometheus addon container. Onboarding to the Azure Monitor Metrics add-on enables the Windows DaemonSet pods to start running on your node pools. Both Windows Server 2019 and Windows Server 2022 are supported. Follow these steps to enable the pods to collect metrics from your Windows node pools.

Note

There's no CPU/Memory limit in windows-exporter-daemonset.yaml so it might overprovision the Windows nodes. For details see Resource reservation

As you deploy workloads, set resource memory and CPU limits on containers. This also subtracts from NodeAllocatable and helps the cluster-wide scheduler in determining which pods to place on which nodes. Scheduling pods without limits might overprovision the Windows nodes and in extreme cases can cause the nodes to become unhealthy.

Install Windows exporter

Manually install windows-exporter on AKS nodes to access Windows metrics by deploying the windows-exporter-daemonset YAML file. Enable the following collectors. For more collectors, see Prometheus exporter for Windows metrics.

[defaults]
- container
- memory
- process
- cpu_info

Deploy the windows-exporter-daemonset YAML file. If there are any taints applied in the node, you need to apply the appropriate tolerations.

kubectl apply -f windows-exporter-daemonset.yaml

Enable Windows metrics

Set the windowsexporter and windowskubeproxy Booleans to true in your metrics settings ConfigMap and apply it to the cluster. See Customize collection of Prometheus metrics from your Kubernetes cluster using ConfigMap.

Enable recording rules

Enable the recording rules that are required for the out-of-the-box dashboards:

If onboarding using CLI, include the option --enable-windows-recording-rules.
If onboarding using an ARM template, Bicep, or Azure Policy, set enableWindowsRecordingRules to true in the parameters file.
If the cluster is already onboarded, use this ARM template and this parameter file to create the rule groups. This adds the required recording rules and isn't an ARM operation on the cluster and doesn't impact current monitoring state of the cluster.

Verify deployment

Use the kubectl command line tool to verify that the agent is deployed properly.

Managed Prometheus

Verify that the DaemonSet was deployed properly on the Linux node pools

kubectl get ds ama-metrics-node --namespace=kube-system

The number of pods should be equal to the number of Linux nodes on the cluster. The output should resemble the following example:

User@aksuser:~$ kubectl get ds ama-metrics-node --namespace=kube-system
NAME               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
ama-metrics-node   1         1         1       1            1           <none>          10h

Verify that Windows nodes were deployed properly

kubectl get ds ama-metrics-win-node --namespace=kube-system

The number of pods should be equal to the number of Windows nodes on the cluster. The output should resemble the following example:

User@aksuser:~$ kubectl get ds ama-metrics-node --namespace=kube-system
NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
ama-metrics-win-node   3         3         3       3            3           <none>          10h

Verify that the two ReplicaSets were deployed for Prometheus

kubectl get rs --namespace=kube-system

The output should resemble the following example:

User@aksuser:~$kubectl get rs --namespace=kube-system
NAME                            DESIRED   CURRENT   READY   AGE
ama-metrics-5c974985b8          1         1         1       11h
ama-metrics-ksm-5fcf8dffcd      1         1         1       11h

Container insights and logging

Verify that the DaemonSets were deployed properly on the Linux node pools

kubectl get ds ama-logs --namespace=kube-system

The number of pods should be equal to the number of Linux nodes on the cluster. The output should resemble the following example:

User@aksuser:~$ kubectl get ds ama-logs --namespace=kube-system
NAME       DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
ama-logs   2         2         2         2            2           <none>          1d

Verify that Windows nodes were deployed properly

kubectl get ds ama-logs-windows --namespace=kube-system

The number of pods should be equal to the number of Windows nodes on the cluster. The output should resemble the following example:

User@aksuser:~$ kubectl get ds ama-logs-windows --namespace=kube-system
NAME                   DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR     AGE
ama-logs-windows           2         2         2         2            2       <none>            1d

Verify deployment of the container logging solution

kubectl get deployment ama-logs-rs --namespace=kube-system

The output should resemble the following example:

User@aksuser:~$ kubectl get deployment ama-logs-rs --namespace=kube-system
NAME          READY   UP-TO-DATE   AVAILABLE   AGE
ama-logs-rs   1/1     1            1           24d

View configuration with CLI

Use the aks show command to find out whether the solution is enabled, the Log Analytics workspace resource ID, and summary information about the cluster.

az aks show --resource-group <resourceGroupofAKSCluster> --name <nameofAksCluster>

The command returns JSON-formatted information about the solution. The addonProfiles section should include information on the omsagent as in the following example:

"addonProfiles": {
    "omsagent": {
        "config": {
            "logAnalyticsWorkspaceResourceID": "/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourcegroups/my-resource-group/providers/microsoft.operationalinsights/workspaces/my-workspace",
            "useAADAuth": "true"
        },
        "enabled": true,
        "identity": null
    },
}

If you experience issues attempting to onboard, review the Troubleshooting guide.
Learn how to Analyze Kubernetes monitoring data in the Azure portal Container insights.

Last updated on 2026-04-28

Enable monitoring for Azure Kubernetes Service (AKS) clusters

Prerequisites

Create workspaces

Prometheus metrics and container insights

Prerequisites

Enable Prometheus metrics on an AKS cluster

Optional parameters

Enable container insights and logging on an AKS cluster

Log configuration file

Stream values

Applicable tables and metrics

Special scenarios

Enable control plane logs on an AKS cluster

Enable Windows metrics (Preview)

Install Windows exporter

Enable Windows metrics

Enable recording rules

Verify deployment

Managed Prometheus

Container insights and logging

Related content

Additional resources