Monitor Azure SQL Database with metrics and alerts
Applies to: Azure SQL Database
You can use Azure Monitor metrics to monitor database and elastic pool resource consumption and health. You can use alerts to send notifications when metric values indicate a potential problem.
Metrics
A metric is a series of numeric value measured at regular time intervals, often using units such as count
, percent
, bytes
, etc. Depending on the nature of the metric, you can use aggregations such as total
, count
, average
, minimum
, maximum
to calculate metric values over a duration of time. You can split some metrics by dimensions. Each dimension provides an additional context to the numeric values.
Examples of available Azure SQL Database metrics are: CPU percentage
, Data space used
, Deadlocks
, Tempdb Percent Log Used
.
Note
Some metrics apply only to specific types of databases or elastic pools. The description of each metric mentions if its use it limited to a specific database or elastic pool type, such as vCore, Hyperscale, serverless, etc.
In Azure SQL Database portal, several commonly used metrics are charted on the Monitoring tab of the Overview page. The metrics let you assess resource consumption and health of a database or an elastic pool at a glance.
Under Key metrics, select See all metrics or anywhere inside the chart to open metrics explorer. In the Metrics page, you can view all other available metrics for the database or elastic pool resource. In metrics explorer, you can change the time range, granularity, and aggregation type for the chart, change the type of chart, expand the scope to include metrics from other Azure resources, create alert rules, etc. You can also open metrics explorer by selecting Metrics menu item, under Monitoring in the resource menu.
Use metrics to monitor databases and elastic pools
You can use metrics to monitor database and elastic pool resource consumption and health. For example, you can:
- Right-size the database or elastic pool to your application workload
- Detect a gradual increase in resource consumption, and proactively scale up the database or elastic pool
- Detect and troubleshoot a performance problem
The following table describes commonly used metrics in Azure SQL Database.
Metric name | Metric ID | Description |
---|---|---|
CPU percentage | cpu_percent |
This metric shows CPU consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. For more information, see Resource consumption by user workloads and internal processes. |
SQL instance CPU percent | sql_instance_cpu_percent |
This metric shows the total CPU consumption by the user and system workloads, expressed as a percentage. Because this metric and the CPU percentage metric are measured on different scales, they are not directly comparable with each other. For more information, see Resource consumption by user workloads and internal processes. |
Data IO percentage | physical_data_read_percent |
This metric shows the data file IO consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. For more information, see Data IO governance. |
Log IO percentage | log_write_percent |
This metric shows the transaction log write throughput consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. For more information, see Transaction log rate governance. |
Workers percentage | workers_percent |
This metric shows the consumption of worker threads toward the user workload limit of a database or an elastic pool, expressed as a percentage. |
DTU percentage | dtu_consumption_percent |
This metric shows DTU consumption toward the user workload limit of a database or an elastic pool, expressed as a percentage. DTU percentage is derived from three other metrics: CPU percentage, Data IO percentage, and Log IO percentage. At any point in time, DTU percentage matches the highest value among these three metrics. |
CPU used | cpu_used |
This metric shows CPU consumption toward the user workload limit of a database or an elastic pool, expressed as the number of vCores. For more information, see Diagnose and troubleshoot high CPU on Azure SQL Database. |
DTU used | dtu_used |
This metric shows the number of DTUs used by a database or an elastic pool. |
App CPU billed | app_cpu_billed |
For serverless databases, this metric shows the amount of compute (CPU and memory) billed, expressed in vCore seconds. For more information, see Billing in the serverless compute tier. |
App CPU percentage | app_cpu_percent |
For serverless databases, this metric shows CPU consumption toward the app package maximum vCore limit, expressed as a percentage. For more information, see Monitoring in the serverless compute tier. |
App memory percentage | app_memory_percent |
For serverless databases, this metric shows memory consumption toward the app package maximum memory limit, expressed as a percentage. For more information, see Monitoring in the serverless compute tier. |
Sessions count | sessions_count |
This metric shows the number of established user sessions for a database or an elastic pool. |
Data space used | storage |
For databases, this metric shows the amount of storage space used in the data files of a database. |
Data space used | storage_used |
For elastic pools, this metric shows the amount of storage space used in the data files of all databases in an elastic pool. |
Data space allocated | allocated_data_storage |
This metric shows the amount of storage space occupied by the data files of a database, or by the data files of all databases in an elastic pool. Data files might contain empty space. Because of this, Data space allocated if often higher than Data space used for the same database or elastic pool. For more information, see Manage file space for databases in Azure SQL Database. |
Data space used percent | storage_percent |
For databases, this metric shows the amount of storage space used in the data files of a database toward the data size limit of a database. For elastic pools, it shows the amount of storage space used in the data files of all databases in an elastic pool, expressed as a percentage toward the data size limit of an elastic pool. The data size limit for a database or an elastic pool might be configured lower than the maximum data size limit. To find the maximum data size limit, see resource limits for vCore databases, vCore elastic pools, DTU databases, and DTU elastic pools. |
Data space allocated percent | allocated_data_storage_percent |
For elastic pools, this metric shows the amount of storage space occupied by the data files of all databases in an elastic pool toward the data size limit of the pool, expressed as a percentage. |
Tempdb Percent Log Used | tempdb_log_used_percent |
This metric shows the consumption of the transaction log space in the tempdb database toward the maximum log size, expressed as a percentage. For more information, see tempdb in Azure SQL Database. |
Successful Connections | connection_successful |
This metric shows the number of successfully established connections to a database. This metric can be split by two dimensions, SslProtocol and ValidatedDriverNameAndVersion , to see the number of connections using a specific encryption protocol version, or using a specific client driver. |
Failed Connections : System Errors | connection_failed |
This metric shows the number of connection attempts to a database that failed because of internal service errors. Most commonly, such errors are transient. This metric can be split by two dimensions, Error and ValidatedDriverNameAndVersion , to see the number of failed connection attempts due to a specific error, or from a specific client driver. |
Failed Connections : User Errors | connection_failed_user_error |
This metric shows the number of connection attempts to a database that failed because of user-correctable errors, such as an incorrect password or connection being blocked by firewall. This metric can be split by two dimensions, Error and ValidatedDriverNameAndVersion , to see the number of failed connection attempts due to a specific error, or from a specific client driver. |
Deadlocks | deadlock |
This metric shows the number of deadlocks in a database. |
Alerts
You can create alert rules to notify you that the value of one metric or multiple metrics is outside of an expected range.
You can set the scope of an alert rule in multiple ways to suit your needs. For example, alert rule scope can be set to:
- A single database
- An elastic pool
- All databases or elastic pools in a resource group
- All databases or elastic pools in a subscription within an Azure region
- All databases or elastic pools in a subscription within all regions
Alert rules periodically evaluate aggregated metric values over a lookback period, comparing them to a threshold value. You can configure the threshold value, evaluation frequency, and lookback period.
If an alert rule is triggered, you are notified according to your notification preferences, which you specify in the action group linked to the alert rule. For example, you can receive an email, an SMS, or a voice notification. An alert rule can also trigger actions such as webhooks, automation runbooks, functions, logic apps, etc.
To learn more about Azure Monitor alerts, see Azure Monitor alerts overview. To get familiar with metric alerts, review Metric alerts, Manage alert rules, and Action groups.
Recommended alert rules
The metrics and optimal thresholds to use in alert rules vary across the wide spectrum of customer workloads in Azure SQL Database.
The recommended alerts in the following table are a starting point to help you define the optimal alerting configuration for your Azure SQL Database resources. Depending on your requirements, your configuration might differ from this example. You might use different thresholds, evaluation frequencies, or lookback periods. You might choose to create additional alerts, or use different alert rule configurations for different applications and environments.
Here are examples of typical alert rule configurations.
Alert rule name | Metric (signal) | Alert logic | When to evaluate | Suggested severity |
---|---|---|---|---|
High user CPU usage | CPU percentage | Threshold: Static Aggregation: Average Operator: Greater than Threshold value: 90 |
Check every: 1 minute Lookback period: 10 minutes |
2 - Warning |
High total CPU usage | SQL instance CPU percent | Threshold: Static Aggregation: Average Operator: Greater than Threshold value: 90 |
Check every: 1 minute Lookback period: 10 minutes |
2 - Warning |
High worker usage | Workers percentage | Threshold: Static Aggregation: Minimum Operator: Greater than Threshold value: 60 |
Check every: 1 minute Lookback period: 5 minutes |
1 - Error |
High data IO usage | Data IO percentage | Threshold: Static Aggregation: Average Operator: Greater than Threshold value: 90 |
Check every: 1 minute Lookback period: 15 minutes |
3 - Informational |
Low data space | Data space used percent | Threshold: Static Aggregation: Minimum Operator: Greater than Threshold value: 95 |
Check every: 15 minute Lookback period: 15 minutes |
1 - Error |
Low tempdb log space |
Tempdb Percent Log Used | Threshold: Static Aggregation: Minimum Operator: Greater than Threshold value: 60 |
Check every: 1 minute Lookback period: 5 minutes |
1 - Error |
Deadlocks | Deadlocks | Threshold: Dynamic Aggregation: Total Operator: Greater than Threshold sensitivity: Medium |
Check every: 15 minutes Lookback period: 1 hour |
3 - Informational |
Failed connections (user errors) | Failed Connections : User Errors | Threshold: Dynamic Aggregation: Total Operator: Greater than Threshold sensitivity: Medium |
Check every: 5 minutes Lookback period: 15 minutes |
2 - Warning |
Failed connections (system errors) | Failed Connections : System Errors | Threshold: Static Aggregation: Total Operator: Greater than Unit: Count Threshold value: 10 |
Check every: 1 minute Lookback period: 5 minutes |
2 - Warning |
Anomalous connection rate | Successful Connections | Threshold: Dynamic Aggregation: Total Operator: Greater or Less than Threshold sensitivity: Low |
Check every: 5 minutes Lookback period: 15 minutes |
2 - Warning |
Some of the recommended alert rules use dynamic thresholds to detect anomalous metric patterns that might require attention. Alert rules based on dynamic thresholds do not trigger until sufficient historical data has been collected to establish normal patterns. For more information, see Dynamic thresholds in metric alerts.
By default, metric alerts are stateful. This means that once an alert rule is triggered, the alert is fired only once. The alert remains in the fired
state until it is resolved, at which point a resolved
notification is sent. An alert rule triggers a new alert only once the previous alert is resolved. Stateful alerts avoid frequent notifications about an ongoing condition. For more information about stateful and stateless alerts, see Alerts and state.
Related content
- Azure Monitor metrics overview
- Azure Monitor metrics aggregation and display explained
- Azure Monitor alerts overview
- Tutorial: Create a metric alert for an Azure resource
- Best practices for Azure Monitor alerts
- Troubleshooting problems in Azure Monitor alerts
- Monitoring and performance tuning in Azure SQL Database and Azure SQL Managed Instance
- Configure streaming export of Azure SQL Database and SQL Managed Instance diagnostic telemetry