适用对象: NoSQL
MongoDB
Cassandra
Gremlin
表
本文介绍如何使用发送到 AzureDiagnostics(旧版)和特定于资源(预览版)的表的诊断日志编写简单的查询,以帮助排查 Azure Cosmos DB 帐户的问题。
对于 Microsoft Azure 诊断表,所有数据都写入一个表中,用户需要指定要查询的类别。
对于特定于资源的表,数据将写入每个资源类别的各个表中(不适用于表 API)。 建议使用此模式,因为它可以大幅简化数据的处理、更好地发现架构、改善引入延迟和查询时间方面的性能。
下面是常见故障排除查询的列表。
找出持续时间超过 3 毫秒的操作。
AzureDiagnostics
| where toint(duration_s) > 3 and ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| summarize count() by clientIpAddress_s, TimeGenerated
找出与每个操作关联的用户代理。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| summarize count() by OperationName, userAgent_s
通过将其运行时以每 5 秒为间隔进行量化,来找出运行时间过长的操作。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| project TimeGenerated , duration_s
| summarize count() by bin(TimeGenerated, 5s)
| render timechart
通过获取物理分区的常见统计信息来衡量偏差。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="PartitionKeyStatistics"
| project SubscriptionId, regionName_s, databaseName_s, collectionName_s, partitionKey_s, sizeKb_d, ResourceId
衡量最大查询的请求开销(以 RU 为单位)。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" and todouble(requestCharge_s) > 10.0
| project activityId_g, requestCharge_s
| join kind= inner (
AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DOCUMENTDB" and Category == "QueryRuntimeStatistics"
| project activityId_g, querytext_s
) on $left.activityId_g == $right.activityId_g
| order by requestCharge_s desc
| limit 100
按其使用的每秒 RU 数对操作进行排序。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2h)
| summarize max(responseLength_s), max(requestLength_s), max(requestCharge_s), count = count() by OperationName, requestResourceType_s, userAgent_s, collectionRid_s, bin(TimeGenerated, 1h)
找出每秒使用的 RU 数超过某个基线数量的查询。
此查询与来自 DataPlaneRequests
和 QueryRunTimeStatistics
的数据联接。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" and todouble(requestCharge_s) > 100.0
| project activityId_g, requestCharge_s
| join kind= inner (
AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DOCUMENTDB" and Category == "QueryRuntimeStatistics"
| project activityId_g, querytext_s
) on $left.activityId_g == $right.activityId_g
| order by requestCharge_s desc
| limit 100
获取特定查询的请求费用和持续时间的统计信息。
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "QueryRuntimeStatistics"
| join (
AzureDiagnostics
| where TimeGenerated >= ago(24hr)
| where Category == "DataPlaneRequests"
) on $left.activityId_g == $right.activityId_g
| project databasename_s, collectionname_s, OperationName1 , querytext_s,requestCharge_s1, duration_s1, bin(TimeGenerated, 1min)
按资源分布情况将操作分组。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2h)
| summarize count = count() by OperationName, requestResourceType_s, bin(TimeGenerated, 1h)
获取物理分区的最大吞吐量。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2h)
| summarize max(requestCharge_s) by bin(TimeGenerated, 1h), partitionId_g
按照每个分区键每秒,衡量每秒 RU 的使用量。
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| summarize total = sum(todouble(requestCharge_s)) by databaseName_s, collectionName_s, partitionKey_s, TimeGenerated
| order by TimeGenerated asc
衡量每个分区键的请求开销。
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| where parse_json(partitionKey_s)[0] == "2"
根据某个时间段内请求单位的使用量对分区键进行排序。
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption"
| where TimeGenerated >= datetime("11/26/2019, 11:20:00.000 PM") and TimeGenerated <= datetime("11/26/2019, 11:30:00.000 PM")
| summarize total = sum(todouble(requestCharge_s)) by databaseName_s, collectionName_s, partitionKey_s
| order by total desc
找出按每个分区键的存储大小筛选的分区键的日志。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="PartitionKeyStatistics"
| where todouble(sizeKb_d) > 800000
衡量性能、操作延迟、每个 RU 使用情况,以及响应时间。
AzureDiagnostics
| where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests"
| where TimeGenerated >= ago(2d)
| summarize percentile(todouble(responseLength_s), 50), percentile(todouble(responseLength_s), 99), max(responseLength_s), percentile(todouble(requestCharge_s), 50), percentile(todouble(requestCharge_s), 99), max(requestCharge_s), percentile(todouble(duration_s), 50), percentile(todouble(duration_s), 99), max(duration_s), count() by OperationName, requestResourceType_s, userAgent_s, collectionRid_s, bin(TimeGenerated, 1h)
使用“ControlPlaneRequests
”获取控制平面日志。
提示
切记按照禁用基于键的元数据写访问权限中所述打开标志,并通过使用 Azure PowerShell、Azure CLI 或 Azure 资源管理器执行操作。
AzureDiagnostics
| where Category =="ControlPlaneRequests"
| summarize by OperationName
- 有关如何为 Azure Cosmos DB 创建诊断设置的详细信息,请参阅创建诊断设置。
- 有关如何使用 Azure 门户、CLI 或 PowerShell 创建诊断设置的详细信息,请参阅创建诊断设置以在 Azure 中收集平台日志和指标。