Best practices for Azure Monitor alerts

This article provides architectural best practices for Azure Monitor alerts, alert processing rules, and action groups. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.

For more information about alerts and notifications, see Azure Monitor alerts overview. For more information about alerting at-scale solutions, see Alerting at-scale.

Reliability

In the cloud, we acknowledge that failures happen. Instead of trying to prevent failures altogether, the goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Azure Monitor alert rule components.

Azure Monitor alerts offer a high degree of reliability without any design decisions. Conditions where a temporary loss of alert data loss may occur are often mitigated by features of other Azure Monitor components.

Design checklist

  • Configure service health alert rules.
  • Configure resource health alert rules.
  • Avoid service limits for alert rules that produce large scale notifications.

Configuration recommendations

Recommendation Benefit
Configure service health alert rules. Service health alerts send you notifications for outages, service disruptions, planned maintenance and security advisories. See Create or edit an alert rule.
Configure resource health alert rules. Resource Health alerts can notify you in near real-time when these resources have a change in their health status. See Create or edit an alert rule.
Avoid service limits for alert rules that produce large scale notifications. If you have alert rules that would send a large number of notifications, you may reach your service limits for the service you use to send email or SMS notifications. Configure programmatic actions or choose an alternate notification method or provider to handle large scale notifications. See Service limits for notifications.

Security

Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to maximize the security of Azure Monitor alerts.

Design checklist

  • Use customer managed keys if you need your own encryption key to protect data and saved queries in your workspaces
  • Use managed identities to increase security by controlling permissions
  • Assign the monitoring reader role for all users who don’t need configuration privileges
  • Use secure webhook actions
  • When using action groups that use private links, use Event hub actions

Configuration recommendations

Recommendation Benefit
Use customer managed keys if you need your own encryption key to protect data and saved queries in your workspaces. Azure Monitor ensures that all data and saved queries are encrypted at rest using Azure-managed keys (MMK). If you require your own encryption key and collect enough data for a dedicated cluster, use customer-managed keys for greater flexibility and key lifecycle control. If you use Microsoft Sentinel, then make sure that you're familiar with the considerations at Set up Microsoft Sentinel customer-managed key.
To control permissions for log search alert rules, use managed identities for your log search alert rules. A common challenge for developers is the management of secrets, credentials, certificates, and keys used to secure communication between services. Managed identities eliminate the need for developers to manage these credentials. Setting a managed identity for your log search alert rules gives you control and visibility into the exact permissions of your alert rule. At any time, you can view your rule’s query permissions and add or remove permissions directly from its managed identity. In addition, using a managed identity is required if your rule’s query is accessing Azure Data Explorer (ADX) or Azure Resource Graph (ARG). See Managed identities.
Assign the monitoring reader role for all users who don’t need configuration privileges. Enhance security by giving users the least amount of privileges required for their role. See Roles, permissions, and security in Azure Monitor.
Where possible, use secure webhook actions. If your alert rule contains an action group that uses webhook actions, prefer using secure webhook actions for additional authentication. See Configure authentication for Secure webhook

Cost optimization

Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.

Note

See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.

Design checklist

  • Activity log alerts, service health alerts, and resource health alerts are free of charge.
  • When using log search alerts, minimize log search alert frequency.
  • When using metric alerts, minimize the number of resources being monitored.

Configuration recommendations

Recommendation Benefit
Keep in mind that activity log alerts, service health alerts, and resource health alerts are free of charge. Azure Monitor activity alerts, service health alerts and resource health alerts are free. If what you want to monitor can be achieved with these alert types, use them.
When using log search alerts, minimize log search alert frequency. When configuring log search alerts, keep in mind that the more frequent the rule evaluation, the higher the cost. Configure your rules accordingly.
When using metric alerts, minimize the number of resources being monitored. Some resource types support metric alert rules that can monitor multiple resources of the same type. For these resource types, keep in mind that the rule can become expensive if the rule monitors many resources. To reduce costs, you can either reduce the scope of the metric alert rule or use log search alert rules, which are less expensive to monitor a large number of resources.

Operational excellence

Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for supporting Azure Monitor alerts.

Design checklist

  • Use dynamic thresholds in metric alert rules where appropriate.
  • Whenever possible, use one alert rule to monitor multiple resources.
  • To control behavior at scale, use alert processing rules.
  • Leverage custom properties to enhance diagnostics
  • Leverage Logic Apps to customize, enrich, and integrate with a variety of systems

Configuration recommendations

Recommendation Benefit
Use dynamic thresholds in metric alert rules where appropriate. You may be unsure of the correct numbers to use as the thresholds for your alert rules. Dynamic thresholds use machine learning and use a set of algorithms and methods to determine the correct thresholds based on trends, so you don't need to know the correct predefined threshold in advance. Dynamic thresholds are also useful for rules that monitor multiple resources, and a single threshold can't be configured for all of the resources. See Dynamic thresholds in metric alerts.
Whenever possible, use one alert rule to monitor multiple resources. Using alert rules that monitor multiple resources reduces management overhead, by allowing you to manage one rule to monitor a large number of resources.
To control behavior at scale, use alert processing rules. Alert processing rules can be used to reduce the number of alert rules you need to create and manage.
Use custom properties to enhance diagnostics. If the alert rule uses action groups, you can add your own properties to include in the alert notification payload. You can use these properties in the actions called by the action group, such as webhook, Azure function or logic app actions.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Alerts offer a high degree of performance efficiency without any design decisions.

Next step