Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes how to enable complete monitoring of your Kubernetes clusters using the following Azure Monitor features:
- Managed Prometheus for metric collection
- Container insights for log collection
- Managed Grafana for visualization.
Using the Azure portal, you can enable all of the features at the same time. You can also enable them individually by using the Azure CLI, Azure Resource Manager template, Terraform, or Azure Policy. Each of these methods is described in this article.
Important
Kubernetes clusters generate a lot of log data, which can result in significant costs if you aren't selective about the logs that you collect. Before you enable monitoring for your cluster, see the following articles to ensure that your environment is optimized for cost and that you limit your log collection to only the data that you require:
- Configure data collection and cost optimization in Container insights using data collection rule
Details on customizing log collection once you've enabled monitoring, including using preset cost optimization configurations. - Best practices for monitoring Kubernetes with Azure Monitor
Best practices for monitoring Kubernetes clusters organized by the five pillars of the Azure Well-Architected Framework, including cost optimization. - Cost optimization in Azure Monitor
Best practices for configuring all features of Azure Monitor to optimize your costs and limit the amount of data that you collect.
This article provides onboarding guidance for the following types of clusters. Any differences in the process for each type are noted in the relevant sections.
Permissions
- You require at least Contributor access to the cluster for onboarding.
- You require Monitoring Reader or Monitoring Contributor to view data after monitoring is enabled.
Managed Prometheus prerequisites
- The cluster must use managed identity authentication.
- The following resource providers must be registered in the subscription of the AKS cluster and the Azure Monitor workspace:
- Microsoft.ContainerService
- Microsoft.Insights
- Microsoft.AlertsManagement
- Microsoft.Monitor
- The following resource providers must be registered in the subscription of the Grafana workspace subscription:
- Microsoft.Dashboard
Arc-Enabled Kubernetes clusters prerequisites
- Prerequisites for Azure Arc-enabled Kubernetes cluster extensions.
- Verify the firewall requirements in addition to the Azure Arc-enabled Kubernetes network requirements.
- If you previously installed monitoring for AKS, ensure that you have disabled monitoring before proceeding to avoid issues during the extension install.
- If you previously installed monitoring on a cluster using a script without cluster extensions, follow the instructions at Disable monitoring of your Kubernetes cluster to delete this Helm chart.
Note
The Managed Prometheus Arc-Enabled Kubernetes (preview) extension does not support the following configurations:
- Red Hat Openshift distributions, including Azure Red Hat OpenShift (ARO)
- Windows nodes
The following table describes the workspaces that are required to support Managed Prometheus and Container insights. You can create each workspace as part of the onboarding process or use an existing workspace. See Design a Log Analytics workspace architecture for guidance on how many workspaces to create and where they should be placed.
Feature | Workspace | Notes |
---|---|---|
Managed Prometheus | Azure Monitor workspace | Contributor permission is enough for enabling the addon to send data to the Azure Monitor workspace. You will need Owner level permission to link your Azure Monitor Workspace to view metrics in Azure Managed Grafana. This is required because the user executing the onboarding step, needs to be able to give the Azure Managed Grafana System Identity Monitoring Reader role on the Azure Monitor Workspace to query the metrics. |
Container insights | Log Analytics workspace | You can attach an AKS cluster to a Log Analytics workspace in a different Azure subscription in the same Microsoft Entra tenant, but you must use the Azure CLI or an Azure Resource Manager template. You can't currently perform this configuration with the Azure portal. If you're connecting an existing AKS cluster to a Log Analytics workspace in another subscription, the Microsoft.ContainerService resource provider must be registered in the subscription with the Log Analytics workspace. For more information, see Register resource provider. |
Managed Grafana | Azure Managed Grafana workspace | Link your Grafana workspace to your Azure Monitor workspace to make the Prometheus metrics collected from your cluster available to Grafana dashboards. |
Use one of the following methods to enable scraping of Prometheus metrics from your cluster and enable Managed Grafana to visualize the metrics. See Link a Grafana workspace for options to connect your Azure Monitor workspace and Azure Managed Grafana workspace.
Note
If you have a single Azure Monitor Resource that is private-linked, then Prometheus enablement won't work if the AKS cluster and Azure Monitor Workspace are in different regions. The configuration needed for the Prometheus add-on isn't available cross region because of the private link constraint. To resolve this, create a new DCE in the AKS cluster location and a new DCRA (association) in the same AKS cluster region. Associate the new DCE with the AKS cluster and name the new association (DCRA) as configurationAccessEndpoint. For full instructions on how to configure the DCEs associated with your Azure Monitor workspace to use a Private Link for data ingestion, see Enable private link for Kubernetes monitoring in Azure Monitor.
- The Azure Monitor workspace and Azure Managed Grafana instance must already be created.
- The template must be deployed in the same resource group as the Azure Managed Grafana instance.
- If the Azure Managed Grafana instance is in a subscription other than the Azure Monitor workspace subscription, register the Azure Monitor workspace subscription with the
Microsoft.Dashboard
resource provider using the guidance at Register resource provider. - Users with the
User Access Administrator
role in the subscription of the AKS cluster can enable theMonitoring Reader
role directly by deploying the template.
Note
Currently in Bicep, there's no way to explicitly scope the Monitoring Reader
role assignment on a string parameter "resource ID" for an Azure Monitor workspace like in an ARM template. Bicep expects a value of type resource | tenant
. There is also no REST API spec for an Azure Monitor workspace.
Therefore, the default scoping for the Monitoring Reader
role is on the resource group. The role is applied on the same Azure Monitor workspace by inheritance, which is the expected behavior. After you deploy this Bicep template, the Grafana instance is given Monitoring Reader
permissions for all the Azure Monitor workspaces in that resource group.
If the Azure Managed Grafana instance is already linked to an Azure Monitor workspace, then you must include this list in the template. On the Overview page for the Azure Managed Grafana instance in the Azure portal, select JSON view, and copy the value of azureMonitorWorkspaceIntegrations
which will look similar to the sample below. If it doesn't exist, then the instance hasn't been linked with any Azure Monitor workspace.
"properties": {
"grafanaIntegrations": {
"azureMonitorWorkspaceIntegrations": [
{
"azureMonitorWorkspaceResourceId": "full_resource_id_1"
},
{
"azureMonitorWorkspaceResourceId": "full_resource_id_2"
}
]
}
}
Download the required files for the type of Kubernetes cluster you're working with.
AKS cluster ARM
- Template file: https://aka.ms/azureprometheus-enable-arm-template
- Parameter file: https://aka.ms/azureprometheus-enable-arm-template-parameters
AKS cluster Bicep
- Template file: https://aka.ms/azureprometheus-enable-bicep-template
- Parameter file: https://aka.ms/azureprometheus-enable-bicep-template-parameters
- DCRA module: https://aka.ms/nested_azuremonitormetrics_dcra_clusterResourceId
- Profile module: https://aka.ms/nested_azuremonitormetrics_profile_clusterResourceId
- Azure Managed Grafana Role Assignment module: https://aka.ms/nested_grafana_amw_role_assignment
Arc-Enabled cluster (preview) ARM
- Template file: https://aka.ms/azureprometheus-arc-arm-template
- Parameter file: https://aka.ms/azureprometheus-arc-arm-template-parameters
Edit the following values in the parameter file. The same set of values are used for both the ARM and Bicep templates. Retrieve the resource ID of the resources from the JSON View of their Overview page.
Parameter Value azureMonitorWorkspaceResourceId
Resource ID for the Azure Monitor workspace. Retrieve from the JSON view on the Overview page for the Azure Monitor workspace. azureMonitorWorkspaceLocation
Location of the Azure Monitor workspace. Retrieve from the JSON view on the Overview page for the Azure Monitor workspace. clusterResourceId
Resource ID for the AKS cluster. Retrieve from the JSON view on the Overview page for the cluster. clusterLocation
Location of the AKS cluster. Retrieve from the JSON view on the Overview page for the cluster. metricLabelsAllowlist
Comma-separated list of Kubernetes labels keys to be used in the resource's labels metric. metricAnnotationsAllowList
Comma-separated list of more Kubernetes label keys to be used in the resource's annotations metric. grafanaResourceId
Resource ID for the managed Grafana instance. Retrieve from the JSON view on the Overview page for the Grafana instance. grafanaLocation
Location for the managed Grafana instance. Retrieve from the JSON view on the Overview page for the Grafana instance. grafanaSku
SKU for the managed Grafana instance. Retrieve from the JSON view on the Overview page for the Grafana instance. Use the sku.name. Open the template file and update the
grafanaIntegrations
property at the end of the file with the values that you retrieved from the Grafana instance. This will look similar to the following samples. In these samples,full_resource_id_1
andfull_resource_id_2
were already in the Azure Managed Grafana resource JSON. The finalazureMonitorWorkspaceResourceId
entry is already in the template and is used to link to the Azure Monitor workspace resource ID provided in the parameters file.ARM
{ "type": "Microsoft.Dashboard/grafana", "apiVersion": "2022-08-01", "name": "[split(parameters('grafanaResourceId'),'/')[8]]", "sku": { "name": "[parameters('grafanaSku')]" }, "location": "[parameters('grafanaLocation')]", "properties": { "grafanaIntegrations": { "azureMonitorWorkspaceIntegrations": [ { "azureMonitorWorkspaceResourceId": "full_resource_id_1" }, { "azureMonitorWorkspaceResourceId": "full_resource_id_2" }, { "azureMonitorWorkspaceResourceId": "[parameters('azureMonitorWorkspaceResourceId')]" } ] } } }
Bicep
resource grafanaResourceId_8 'Microsoft.Dashboard/grafana@2022-08-01' = { name: split(grafanaResourceId, '/')[8] sku: { name: grafanaSku } identity: { type: 'SystemAssigned' } location: grafanaLocation properties: { grafanaIntegrations: { azureMonitorWorkspaceIntegrations: [ { azureMonitorWorkspaceResourceId: 'full_resource_id_1' } { azureMonitorWorkspaceResourceId: 'full_resource_id_2' } { azureMonitorWorkspaceResourceId: azureMonitorWorkspaceResourceId } ] } } }
Deploy the template with the parameter file by using any valid method for deploying Resource Manager templates. For examples of different methods, see Deploy the sample templates.
Use one of the following methods to enable Container insights on your cluster. Once this is complete, see Configure agent data collection for Container insights to customize your configuration to ensure that you aren't collecting more data than you require.
Both ARM and Bicep templates are provided in this section.
- The template must be deployed in the same resource group as the cluster.
Download and edit template and parameter file
AKS cluster ARM
- Template file: https://aka.ms/aks-enable-monitoring-msi-onboarding-template-file
- Parameter file: https://aka.ms/aks-enable-monitoring-msi-onboarding-template-parameter-file
AKS cluster Bicep
- Template file (Syslog): https://aka.ms/enable-monitoring-msi-syslog-bicep-template
- Parameter file (No Syslog): https://aka.ms/enable-monitoring-msi-syslog-bicep-parameters
- Template file (No Syslog): https://aka.ms/enable-monitoring-msi-bicep-template
- Parameter file (No Syslog): https://aka.ms/enable-monitoring-msi-bicep-parameters
Arc-enabled cluster ARM
- Template file: https://aka.ms/arc-k8s-azmon-extension-msi-arm-template
- Parameter file: https://aka.ms/arc-k8s-azmon-extension-msi-arm-template-params
- Template file (legacy authentication): https://aka.ms/arc-k8s-azmon-extension-arm-template
- Parameter file (legacy authentication): https://aka.ms/arc-k8s-azmon-extension-arm-template-params
Edit the following values in the parameter file. The same set of values are used for both the ARM and Bicep templates. Retrieve the resource ID of the resources from the JSON View of their Overview page.
Parameter Description AKS: aksResourceId
Arc:clusterResourceId
Resource ID of the cluster. AKS: aksResourceLocation
Arc:clusterRegion
Location of the cluster. AKS: workspaceResourceId
Arc:workspaceResourceId
Resource ID of the Log Analytics workspace. Arc: workspaceRegion
Region of the Log Analytics workspace. Arc: workspaceDomain
Domain of the Log Analytics workspace. opinsights.azure.com
for Azure public cloudopinsights.azure.cn
for Microsoft Azure operated by 21Vianet.AKS: resourceTagValues
Tag values specified for the existing Container insights extension data collection rule (DCR) of the cluster and the name of the DCR. The name will be MSCI-<clusterName>-<clusterRegion>
and this resource created in an AKS clusters resource group. For first time onboarding, you can set arbitrary tag values.Deploy the template with the parameter file by using any valid method for deploying Resource Manager templates. For examples of different methods, see Deploy the sample templates.
When you create a new AKS cluster in the Azure portal, you can enable Prometheus, Container insights, and Grafana from the Monitoring tab. Make sure that you check the Enable Container Logs, Enable Prometheus metrics, and Enable Grafana checkboxes.
- Navigate to your AKS cluster in the Azure portal.
- In the service menu, under Monitoring, select Insights > Configure monitoring.
- Container insights is already enabled. Select the Enable Prometheus metrics and Enable Grafana checkboxes. If you have existing Azure Monitor workspace and Grafana workspace, then they're selected for you.
- Select Advanced settings if you want to select alternate workspaces or create new ones. The Cost presets setting allows you to modify the default collection details to reduce your monitoring costs. See Enable cost optimization settings in Container insights for details.
- Select Configure.
- Navigate to your AKS cluster in the Azure portal.
- In the service menu, under Monitoring, select Insights > Configure monitoring.
- Select the Enable Prometheus metrics checkbox.
- Select Advanced settings if you want to select alternate workspaces or create new ones. The Cost presets setting allows you to modify the default collection details to reduce your monitoring costs.
- Select Configure.
Note
There is no CPU/Memory limit in windows-exporter-daemonset.yaml so it may over-provision the Windows nodes
For more details see Resource reservation
As you deploy workloads, set resource memory and CPU limits on containers. This also subtracts from NodeAllocatable and helps the cluster-wide scheduler in determining which pods to place on which nodes. Scheduling pods without limits may over-provision the Windows nodes and in extreme cases can cause the nodes to become unhealthy.
As of version 6.4.0-main-02-22-2023-3ee44b9e of the Managed Prometheus addon container (prometheus_collector), Windows metric collection has been enabled for the AKS clusters. Onboarding to the Azure Monitor Metrics add-on enables the Windows DaemonSet pods to start running on your node pools. Both Windows Server 2019 and Windows Server 2022 are supported. Follow these steps to enable the pods to collect metrics from your Windows node pools.
Manually install windows-exporter on AKS nodes to access Windows metrics by deploying the windows-exporter-daemonset YAML file. Enable the following collectors:
[defaults]
container
memory
process
cpu_info
For more collectors, please see Prometheus exporter for Windows metrics.
Deploy the windows-exporter-daemonset YAML file. Note that if there are any taints applied in the node, you will need to apply the appropriate tolerations.
kubectl apply -f windows-exporter-daemonset.yaml
Apply the ama-metrics-settings-configmap to your cluster. Set the
windowsexporter
andwindowskubeproxy
Booleans totrue
. For more information, see Metrics add-on settings configmap.Enable the recording rules that are required for the out-of-the-box dashboards:
- If onboarding using the CLI, include the option
--enable-windows-recording-rules
. - If onboarding using an ARM template, Bicep, or Azure Policy, set
enableWindowsRecordingRules
totrue
in the parameters file. - If the cluster is already onboarded, use this ARM template and this parameter file to create the rule groups. This will add the required recording rules and is not an ARM operation on the cluster and does not impact current monitoring state of the cluster.
- If onboarding using the CLI, include the option
Use the kubectl command line tool to verify that the agent is deployed properly.
Verify that the DaemonSet was deployed properly on the Linux node pools
kubectl get ds ama-metrics-node --namespace=kube-system
The number of pods should be equal to the number of Linux nodes on the cluster. The output should resemble the following example:
User@aksuser:~$ kubectl get ds ama-metrics-node --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ama-metrics-node 1 1 1 1 1 <none> 10h
Verify that Windows nodes were deployed properly
kubectl get ds ama-metrics-win-node --namespace=kube-system
The number of pods should be equal to the number of Windows nodes on the cluster. The output should resemble the following example:
User@aksuser:~$ kubectl get ds ama-metrics-node --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ama-metrics-win-node 3 3 3 3 3 <none> 10h
Verify that the two ReplicaSets were deployed for Prometheus
kubectl get rs --namespace=kube-system
The output should resemble the following example:
User@aksuser:~$kubectl get rs --namespace=kube-system
NAME DESIRED CURRENT READY AGE
ama-metrics-5c974985b8 1 1 1 11h
ama-metrics-ksm-5fcf8dffcd 1 1 1 11h
Verify that the DaemonSets were deployed properly on the Linux node pools
kubectl get ds ama-logs --namespace=kube-system
The number of pods should be equal to the number of Linux nodes on the cluster. The output should resemble the following example:
User@aksuser:~$ kubectl get ds ama-logs --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ama-logs 2 2 2 2 2 <none> 1d
Verify that Windows nodes were deployed properly
kubectl get ds ama-logs-windows --namespace=kube-system
The number of pods should be equal to the number of Windows nodes on the cluster. The output should resemble the following example:
User@aksuser:~$ kubectl get ds ama-logs-windows --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ama-logs-windows 2 2 2 2 2 <none> 1d
Verify deployment of the Container insights solution
kubectl get deployment ama-logs-rs --namespace=kube-system
The output should resemble the following example:
User@aksuser:~$ kubectl get deployment ama-logs-rs --namespace=kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
ama-logs-rs 1/1 1 1 24d
View configuration with CLI
Use the aks show
command to find out whether the solution is enabled, the Log Analytics workspace resource ID, and summary information about the cluster.
az aks show --resource-group <resourceGroupofAKSCluster> --name <nameofAksCluster>
The command will return JSON-formatted information about the solution. The addonProfiles
section should include information on the omsagent
as in the following example:
"addonProfiles": {
"omsagent": {
"config": {
"logAnalyticsWorkspaceResourceID": "/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourcegroups/my-resource-group/providers/microsoft.operationalinsights/workspaces/my-workspace",
"useAADAuth": "true"
},
"enabled": true,
"identity": null
},
}
When you enable monitoring, the following resources are created in your subscription:
Resource Name | Resource Type | Resource Group | Region/Location | Description |
---|---|---|---|---|
MSCI-<aksclusterregion>-<clustername> |
Data Collection Rule | Same as cluster | Same as Log Analytics workspace | This data collection rule is for log collection by Azure Monitor agent, which uses the Log Analytics workspace as destination, and is associated to the AKS cluster resource. |
MSPROM-<aksclusterregion>-<clustername> |
Data Collection Rule | Same as cluster | Same as Azure Monitor workspace | This data collection rule is for prometheus metrics collection by metrics addon, which has the chosen Azure monitor workspace as destination, and also it is associated to the AKS cluster resource |
MSPROM-<aksclusterregion>-<clustername> |
Data Collection endpoint | Same as cluster | Same as Azure Monitor workspace | This data collection endpoint is used by the above data collection rule for ingesting Prometheus metrics from the metrics addon |
When you create a new Azure Monitor workspace, the following additional resources are created as part of it
Resource Name | Resource Type | Resource Group | Region/Location | Description |
---|---|---|---|---|
<azuremonitor-workspace-name> |
Data Collection Rule | MA_<azuremonitor-workspace-name>_<azuremonitor-workspace-region>_managed | Same as Azure Monitor Workspace | DCR created when you use OSS Prometheus server to Remote Write to Azure Monitor Workspace. |
<azuremonitor-workspace-name> |
Data Collection Endpoint | MA_<azuremonitor-workspace-name>_<azuremonitor-workspace-region>_managed | Same as Azure Monitor Workspace | DCE created when you use OSS Prometheus server to Remote Write to Azure Monitor Workspace. |
The main differences in monitoring a Windows Server cluster compared to a Linux cluster include:
- Windows doesn't have a Memory RSS metric. As a result, it isn't available for Windows nodes and containers. The Working Set metric is available.
- Disk storage capacity information isn't available for Windows nodes.
- Only pod environments are monitored, not Docker environments.
- With the preview release, a maximum of 30 Windows Server containers are supported. This limitation doesn't apply to Linux containers.
Note
Container insights support for the Windows Server 2022 operating system is in preview.
The containerized Linux agent (replicaset pod) makes API calls to all the Windows nodes on Kubelet secure port (10250) within the cluster to collect node and container performance-related metrics. Kubelet secure port (:10250) should be opened in the cluster's virtual network for both inbound and outbound for Windows node and container performance-related metrics collection to work.
If you have a Kubernetes cluster with Windows nodes, review and configure the network security group and network policies to make sure the Kubelet secure port (:10250) is open for both inbound and outbound in the cluster's virtual network.
- If you experience issues while you attempt to onboard the solution, review the Troubleshooting guide.
- With monitoring enabled to collect health and resource utilization of your AKS cluster and workloads running on them, learn how to use Container insights.