Log alerts in Azure Monitor
Log alerts are one of the alert types that are supported in Azure Alerts. Log alerts allow users to use a Log Analytics query to evaluate resources logs every set frequency, and fire an alert based on the results. Rules can trigger one or more actions using Action Groups.
Log data from a Log Analytics workspace can be sent to the Azure Monitor metrics store. Metrics alerts have different behavior, which may be more desirable depending on the data you are working with. For information on what and how you can route logs to metrics, see Metric Alert for Logs.
Log alerts run queries on Log Analytics data. First you should start collecting log data and query the log data for issues. You can use the alert query examples topic in Log Analytics to understand what you can discover or get started on writing your own query.
Azure Monitoring Contributor is a common role that is needed for creating, modifying, and updating log alerts. Access & query execution rights for the resource logs are also needed. Partial access to resource logs can fail queries or return partial results. Learn more about configuring log alerts in Azure.
Log alerts for Log Analytics used to be managed using the legacy Log Analytics Alert API.
Query evaluation definition
Log search rules condition definition starts from:
- What query to run?
- How to use the results?
The following sections describe the different parameters you can use to set the above logic.
The Log Analytics query used to evaluate the rule. The results returned by this query are used to determine whether an alert is to be triggered. The query can be scoped to:
- A specific resource, such as a virtual machine.
- An at scale resource, such as a subscription or resource group.
- Multiple resources using cross-resource query.
Alert queries have constraints to ensure optimal performance and the relevance of the results. Learn more here.
Resource centric and cross-resource query are only supported using the current scheduledQueryRules API.
Query time Range
Time range is set in the rule condition definition. It's called Override query time range in the advance settings section.
Unlike log analytics, the time range in alerts is limited to a maximum of two days of data. Even if longer range ago command is used in the query, the time range will apply. For example, a query scans up to 2 days, even if the text contains ago(7d).
If you use ago command in the query, the range is automatically set to two days. You can also change time range manually in cases the query requires more data than the alert evaluation even if there is no ago command in the query.
Log alerts turn log into numeric values that can be evaluated. You can measure two different things:
- Result count
- Calculation of a value
Count of results is the default measure and is used when you set a Measure with a selection of Table rows. Ideal for working with events such as Windows event logs, syslog, application exceptions. Triggers when log records happen or doesn't happen in the evaluated time window.
Log alerts work best when you try to detect data in the log. It works less well when you try to detect lack of data in the logs. For example, alerting on virtual machine heartbeat.
Since logs are semi-structured data, they are inherently more latent than metric, you may experience misfires when trying to detect lack of data in the logs, and you should consider using metric alerts. You can send data to the metric store from logs using metric alerts for logs.
Example of result count use case
You want to know when your application responded with error code 500 (Internal Server Error). You would create an alert rule with the following details:
requests | where resultCode == "500"
- Aggregation granularity: 15 minutes
- Alert frequency: 15 minutes
- Threshold value: Greater than 0
Then alert rules monitors for any requests ending with 500 error code. The query runs every 15 minutes, over the last 15 minutes. If even one record is found, it fires the alert and triggers the actions configured.
Calculation of a value
Calculation of a value is used when you select a column name of a numeric column for the Measure, and the result is a calculation that you perform on the values in that column. This would be used, for example, as CPU counter value.
The calculation that is done on multiple records to aggregate them to one numeric value using the Aggregation granularity defined. For example:
- Sum returns the sum of measure column.
- Average returns the average of the measure column.
Determines the interval that is used to aggregate multiple records to one numeric value. For example, if you specified 5 minutes, records would be grouped by 5-minute intervals using the Aggregation type specified.
Split by alert dimensions
Split alerts by number or string columns into separate alerts by grouping into unique combinations. It's configured in Split by dimensions section of the condition (limited to six splits). When creating resource-centric alerts at scale (subscription or resource group scope), you can split by Azure resource ID column. Splitting on Azure resource ID column will change the target of the alert to the specified resource.
Splitting by Azure resource ID column is recommended when you want to monitor the same condition on multiple Azure resources. For example, monitoring all virtual machines for CPU usage over 80%. You may also decide not to split when you want a condition on multiple resources in the scope. Such as monitoring that at least five machines in the resource group scope have CPU usage over 80%.
Example of splitting by alert dimensions
For example, you want to monitor errors for multiple virtual machines running your web site/app in a specific resource group. You can do that using a log alert rule as follows:
// Reported errors union Event, Syslog // Event table stores Windows event records, Syslog stores Linux records | where EventLevelName == "Error" // EventLevelName is used in the Event (Windows) records or SeverityLevel== "err" // SeverityLevel is used in Syslog (Linux) records
Resource ID Column: _ResourceId
- Computer = VM1, VM2 (Filtering values in alert rules definition isn't available currently for workspaces and Application Insights. Filter in the query text.)
Aggregation granularity: 15 minutes
Alert frequency: 15 minutes
Threshold value: Greater than 0
This rule monitors if any virtual machine had error events in the last 15 minutes. Each virtual machine is monitored separately and will trigger actions individually.
Split by alert dimensions is only available for the current scheduledQueryRules API. Resource centric alerting at scale is only supported in the API version
2021-08-01 and above.
Alert logic definition
Once you define the query to run and evaluation of the results, you need to define the alerting logic and when to fire actions. The following sections describe the different parameters you can use:
Threshold and operator
The query results are transformed into a number that is compared against the threshold and operator.
The interval in which the query is run. Can be set from a minute to a day.
Number of violations to trigger alert
You can specify the alert evaluation period and the number of failures needed to trigger an alert. Allowing you to better define an impact time to trigger an alert.
For example, if your rule Aggregation granularity is defined as '5 minutes', you can trigger an alert only if three failures (15 minutes) of the last hour occurred. This setting is defined by your application business policy.
State and resolving alerts
Log alerts can either be stateless or stateful (currently in preview).
Stateless alerts fire each time the condition is met, even if fired previously. You can mark the alert as closed once the alert instance is resolved. You can also mute actions to prevent them from triggering for a period after an alert rule fired using the Mute Actions option in the alert details section.
See this alert stateless evaluation example:
|Time||Log condition evaluation||Result|
|00:05||FALSE||Alert doesn't fire. No actions called.|
|00:10||TRUE||Alert fires and action groups called. New alert state ACTIVE.|
|00:15||TRUE||Alert fires and action groups called. New alert state ACTIVE.|
|00:20||FALSE||Alert doesn't fire. No actions called. Pervious alerts state remains ACTIVE.|
Stateful alerts fire once per incident and resolve. The alert rule resolves when the alert condition isn't met for 30 minutes for a specific evaluation period (to account for log ingestion delay), and for three consecutive evaluations to reduce noise if there is flapping conditions. For example, with a frequency of 5 minutes, the alert resolve after 40 minutes or with a frequency of 1 minute, the alert resolve after 32 minutes. The resolved notification is sent out via web-hooks or email, the status of the alert instance (called monitor state) in Azure portal is also set to resolved.
Stateful alerts feature is currently in preview. You can set this using Automatically resolve alerts in the alert details section.
Location selection in log alerts
Log alerts allow you to set a location for alert rules. You can select any of the supported locations, which align to Log Analytics supported region list.
Location affects which region the alert rule is evaluated in. Queries are executed on the log data in the selected region, that said, the alert service end to end is global. Meaning alert rule definition, fired alerts, notifications, and actions aren't bound to the location in the alert rule. Data is transfer from the set region since the Azure Monitor alerts service is a non-regional service.
Pricing and billing of log alerts
Pricing information is located in the Azure Monitor pricing page. Log Alerts are listed under resource provider
- Log Alerts on Application Insights shown with exact resource name along with resource group and alert properties.
- Log Alerts on Log Analytics shown with exact resource name along with resource group and alert properties; when created using scheduledQueryRules API.
- Log alerts created from legacy Log Analytics API aren't tracked Azure Resources and don't have enforced unique resource names. These alerts are still created on
microsoft.insights/scheduledqueryrulesas hidden resources, which have this resource naming structure
<WorkspaceName>|<savedSearchId>|<scheduleId>|<ActionId>. Log Alerts on legacy API are shown with above hidden resource name along with resource group and alert properties.
Unsupported resource characters such as
<, >, %, &, \, ?, / are replaced with
_ in the hidden resource names and this will also reflect in the billing information.
Log alerts for Log Analytics used to be managed using the legacy Log Analytics Alert API and legacy templates of Log Analytics saved searches and alerts. Any alert rule management should be done using legacy Log Analytics API until you decide to switch and you can't use the hidden resources.