Troubleshoot issues with diagnostics queries
APPLIES TO:
NoSQL
MongoDB
Cassandra
Gremlin
Table
In this article, we cover how to write simple queries to help troubleshoot issues with your Azure Cosmos DB account using diagnostics logs sent to AzureDiagnostics (legacy) and Resource-specific (preview) tables.
For Azure Diagnostics tables, all data is written into one single table and users need to specify which category they'd like to query.
For resource-specific tables, data is written into individual tables for each category of the resource (not available for table API). We recommend this mode since it makes it much easier to work with the data, provides better discoverability of the schemas, and improves performance across both ingestion latency and query times.
Here's a list of common troubleshooting queries.
Find operations that have a duration greater than 3 milliseconds.
AzureDiagnostics
| where toint(duration_s) > 3 and ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| summarize count() by clientIpAddress_s, TimeGenerated
Find user agents associated with each operation.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| summarize count() by OperationName, userAgent_s
Find operations that ran for a long time by binning their runtime into five-second intervals.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| project TimeGenerated , duration_s
| summarize count() by bin(TimeGenerated, 5s)
| render timechart
Measure skew by getting common statistics for physical partitions.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="PartitionKeyStatistics"
| project SubscriptionId, regionName_s, databaseName_s, collectionName_s, partitionKey_s, sizeKb_d, ResourceId
Measure the request charge (in RUs) for the largest queries.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" and todouble(requestCharge_s) > 10.0
| project activityId_g, requestCharge_s
| join kind= inner (
AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DOCUMENTDB" and Category == "QueryRuntimeStatistics"
| project activityId_g, querytext_s
) on $left.activityId_g == $right.activityId_g
| order by requestCharge_s desc
| limit 100
Sort operations by the amount of RU/s they're using.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2h)
| summarize max(responseLength_s), max(requestLength_s), max(requestCharge_s), count = count() by OperationName, requestResourceType_s, userAgent_s, collectionRid_s, bin(TimeGenerated, 1h)
Find queries that consume more RU/s than a baseline amount.
This query joins with data from DataPlaneRequests
and QueryRunTimeStatistics
.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" and todouble(requestCharge_s) > 100.0
| project activityId_g, requestCharge_s
| join kind= inner (
AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DOCUMENTDB" and Category == "QueryRuntimeStatistics"
| project activityId_g, querytext_s
) on $left.activityId_g == $right.activityId_g
| order by requestCharge_s desc
| limit 100
Get statistics in both request charge and duration for a specific query.
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "QueryRuntimeStatistics"
| join (
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
) on $left.activityId_g == $right.activityId_g
| project databasename_s, collectionname_s, OperationName1 , querytext_s,requestCharge_s1, duration_s1, bin(TimeGenerated, 1min)
Group operations by the resource distribution.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2h)
| summarize count = count() by OperationName, requestResourceType_s, bin(TimeGenerated, 1h)
Get the maximum throughput for a physical partition.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2h)
| summarize max(requestCharge_s) by bin(TimeGenerated, 1h), partitionId_g
Measure RU/s consumption on a per-second basis per partition key.
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| summarize total = sum(todouble(requestCharge_s)) by databaseName_s, collectionName_s, partitionKey_s, TimeGenerated
| order by TimeGenerated asc
Measure request charge per partition key.
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| where parse_json(partitionKey_s)[0] == "2"
Sort partition keys based on request unit consumption within a time window.
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| where TimeGenerated >= datetime("11/26/2019, 11:20:00.000 PM") and TimeGenerated <= datetime("11/26/2019, 11:30:00.000 PM")
| summarize total = sum(todouble(requestCharge_s)) by databaseName_s, collectionName_s, partitionKey_s
| order by total desc
Find logs for partition keys filtered by the size of storage per partition key.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="PartitionKeyStatistics"
| where todouble(sizeKb_d) > 800000
Measure performance for; operation latency, RU/s usage, and response length.
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2d)
| summarize percentile(todouble(responseLength_s), 50), percentile(todouble(responseLength_s), 99), max(responseLength_s), percentile(todouble(requestCharge_s), 50), percentile(todouble(requestCharge_s), 99), max(requestCharge_s), percentile(todouble(duration_s), 50), percentile(todouble(duration_s), 99), max(duration_s), count() by OperationName, requestResourceType_s, userAgent_s, collectionRid_s, bin(TimeGenerated, 1h)
Get control plane long using ControlPlaneRequests
.
Tip
Remember to switch on the flag described in Disable key-based metadata write access, and execute the operations by using Azure PowerShell, the Azure CLI, or Azure Resource Manager.
AzureDiagnostics
| where Category =="ControlPlaneRequests"
| summarize by OperationName
- For more information on how to create diagnostic settings for Azure Cosmos DB, see Creating Diagnostics settings.
- For detailed information about how to create a diagnostic setting by using the Azure portal, CLI, or PowerShell, see create diagnostic setting to collect platform logs and metrics in Azure.