Optimize log search alert queries
This article describes how to write and convert log search alerts to achieve optimal performance. Optimized queries reduce latency and load of alerts, which run frequently.
Start writing an alert log query
Alert queries start from querying the log data in Log Analytics that indicates the issue. To understand what you can discover, see Using queries in Azure Monitor Log Analytics. You can also get started on writing your own query.
Queries that indicate the issue and not the alert
The alert flow was built to transform the results that indicate the issue to an alert. For example, in the case of a query like:
SecurityEvent
| where EventID == 4624
If the intent of the user is to alert, when this event type happens, the alerting logic appends count
to the query. The query that runs will be:
SecurityEvent
| where EventID == 4624
| count
There's no need to add alerting logic to the query, and doing that might even cause issues. In the preceding example, if you include count
in your query, it will always result in the value 1, because the alert service will do count
of count
.
Avoid limit and take operators
Using limit
and take
in queries can increase latency and load of alerts because the results aren't consistent over time. Use them only if needed.
Log query constraints
Log queries in Azure Monitor start with either a table, search
, or union
operator.
Queries for log search alert rules should always start with a table to define a clear scope, which improves query performance and the relevance of the results. Queries in alert rules run frequently. Using search
and union
can result in excessive overhead that adds latency to the alert because it requires scanning across multiple tables. These operators also reduce the ability of the alerting service to optimize the query.
We don't support creating or modifying log search alert rules that use search
or union
operators, except for cross-resource queries.
For example, the following alerting query is scoped to the SecurityEvent table and searches for a specific event ID. It's the only table that the query must process.
SecurityEvent
| where EventID == 4624
Log search alert rules using cross-resource queries aren't affected by this change because cross-resource queries use a type of union
, which limits the query scope to specific resources. The following example would be a valid log search alert query:
union
app('00000000-0000-0000-0000-000000000001').requests,
app('00000000-0000-0000-0000-000000000002').requests,
workspace('00000000-0000-0000-0000-000000000003').Perf
Note
Cross-resource queries are supported in the new scheduledQueryRules API.
Examples
The following examples include log queries that use search
and union
. They provide steps you can use to modify these queries for use in alert rules.
Example 1
You want to create a log search alert rule by using the following query that retrieves performance information using search
:
search *
| where Type == 'Perf' and CounterName == '% Free Space'
| where CounterValue < 30
To modify this query, start by using the following query to identify the table that the properties belong to:
search * | where CounterName == '% Free Space' | summarize by $table
The result of this query would show that the CounterName property came from the Perf table.
Use this result to create the following query that you would use for the alert rule:
Perf | where CounterName == '% Free Space' | where CounterValue < 30
Example 2
You want to create a log search alert rule by using the following query that retrieves performance information using search
:
search ObjectName =="Memory" and CounterName=="% Committed Bytes In Use"
| summarize Avg_Memory_Usage =avg(CounterValue) by Computer
| where Avg_Memory_Usage between(90 .. 95)
To modify this query, start by using the following query to identify the table that the properties belong to:
search ObjectName=="Memory" and CounterName=="% Committed Bytes In Use" | summarize by $table
The result of this query would show that the ObjectName and CounterName properties came from the Perf table.
Use this result to create the following query that you would use for the alert rule:
Perf | where ObjectName =="Memory" and CounterName=="% Committed Bytes In Use" | summarize Avg_Memory_Usage=avg(CounterValue) by Computer | where Avg_Memory_Usage between(90 .. 95)
Example 3
You want to create a log search alert rule by using the following query that uses both search
and union
to retrieve performance information:
search (ObjectName == "Processor" and CounterName == "% Idle Time" and InstanceName == "_Total")
| where Computer !in (
union *
| where CounterName == "% Processor Utility"
| summarize by Computer)
| summarize Avg_Idle_Time = avg(CounterValue) by Computer
To modify this query, start by using the following query to identify the table that the properties in the first part of the query belong to:
search (ObjectName == "Processor" and CounterName == "% Idle Time" and InstanceName == "_Total") | summarize by $table
The result of this query would show that all these properties came from the Perf table.
Use
union
with thewithsource
command to identify which source table has contributed each row:union withsource=table * | where CounterName == "% Processor Utility" | summarize by table
The result of this query would show that these properties also came from the Perf table.
Use these results to create the following query that you would use for the alert rule:
Perf | where ObjectName == "Processor" and CounterName == "% Idle Time" and InstanceName == "_Total" | where Computer !in ( (Perf | where CounterName == "% Processor Utility" | summarize by Computer)) | summarize Avg_Idle_Time = avg(CounterValue) by Computer
Example 4
You want to create a log search alert rule by using the following query that joins the results of two search
queries:
search Type == 'SecurityEvent' and EventID == '4625'
| summarize by Computer, Hour = bin(TimeGenerated, 1h)
| join kind = leftouter (
search in (Heartbeat) OSType == 'Windows'
| summarize arg_max(TimeGenerated, Computer) by Computer , Hour = bin(TimeGenerated, 1h)
| project Hour , Computer
) on Hour
To modify the query, start by using the following query to identify the table that contains the properties in the left side of the join:
search Type == 'SecurityEvent' and EventID == '4625' | summarize by $table
The result indicates that the properties in the left side of the join belong to the SecurityEvent table.
Use the following query to identify the table that contains the properties in the right side of the join:
search in (Heartbeat) OSType == 'Windows' | summarize by $table
The result indicates that the properties in the right side of the join belong to the Heartbeat table.
Use these results to create the following query that you would use for the alert rule:
SecurityEvent | where EventID == '4625' | summarize by Computer, Hour = bin(TimeGenerated, 1h) | join kind = leftouter ( Heartbeat | where OSType == 'Windows' | summarize arg_max(TimeGenerated, Computer) by Computer , Hour = bin(TimeGenerated, 1h) | project Hour , Computer ) on Hour
Next steps
- Learn about log search alerts in Azure Monitor.
- Learn about log queries.