本文介绍如何使用表诊断 Azure Cosmos DB for NoSQL 中的数据CDBDataPlaneRequests5M。 此表是 聚合诊断日志 功能的一部分。
聚合诊断日志功能旨在通过将诊断数据汇总为 5 分钟和 15 分钟的间隔,从而大幅节省成本并增强故障排除功能。 聚合日志将写入 特定于资源的表,从而提高架构可发现性、引入延迟和整体查询效率。
小窍门
大规模记录每个请求的详细跟踪可能会非常昂贵。 聚合诊断提供精简高效的替代方法,最多可减少 95% 日志记录成本。
先决条件
Azure 订阅服务
- 如果没有 Azure 订阅,可在开始前创建一个试用帐户。
一个现有的 Azure Cosmos DB for NoSQL 帐户
- 如果没有帐户,请 创建新帐户。
现有 Azure Monitor - Log Analytics 工作区
配置诊断设置
首先,必须启用 诊断设置。 在使用 CDBDataPlaneRequests5M 表之前,需要执行此步骤。 使用以下设置和值配置诊断:
| 价值 | |
|---|---|
| 目的地 | 选择目标 Log Analytics 工作区 |
| 表格式 | Resource-specific |
| 类别 |
DataPlaneRequests5M 或 DataPlaneRequests15M (仅限聚合版本,而不是按请求) |
警告
除非明确需要每个请求的详细日志或查询分析,避免选择经典DataPlaneRequests类别。 聚合表(CDBDataPlaneRequests5M,) CDBDataPlaneRequests15M具有显著的成本效益。
查询数据源
下面是可以使用聚合诊断日志功能执行的查询列表。 这些查询可帮助解决常见的故障排除方案。
//1. Are you experiencing spikes in server-side latency?
//2. Was the latency on a particular Operation?
CDBDataPlaneRequests5M
//| where TimeGenerated > now(-6h)
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| summarize TotalDurationInMs=sum(TotalDurationMs), MaxRequestCharge=max(MaxDurationMs), AverageRequestCharge=max(AvgDurationMs) by OperationName, TimeGenerated//, bin(TimeGenerated, 1d)
| render timechart
//3. Was the latency on a particular partition or many partitions?
CDBDataPlaneRequests5M
//| where TimeGenerated > now(-6h)
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| summarize TotalDurationInMs=sum(TotalDurationMs), MaxRequestCharge=max(MaxDurationMs), AverageRequestCharge=max(AvgDurationMs) by PartitionId//, bin(TimeGenerated, 1d)
| render timechart
//4. Were you also experiencing throttling? If throttled percentage is above 5% and you are experiencing high latency this is a sign to continue troubleshooting.
CDBDataPlaneRequests5M
//| where TimeGenerated > now(-6h)
//| where OperationName == "Insert from previous step if latency was on a particular operation"
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| summarize throttledOperations=sumif(SampleCount, StatusCode == 429), totalOperations=sum(SampleCount) by TimeGenerated, OperationName
| extend throttledPercentage = throttledOperations/ totalOperations * 1.0
//| summarize count() by TimeGenerated
//| render timechart
//5. Did transaction volume drastically increase/decrease recently?
CDBDataPlaneRequests5M
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| summarize count() by TimeGenerated
| render timechart
//6. Did RU/s per operation increase?
//7. Did RU/s per partition increase?
CDBDataPlaneRequests5M
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| summarize TotalRequestCharge=sum(TotalRequestCharge), MaxRequestCharge=max(MaxRequestCharge), AverageRequestCharge=max(AvgRequestCharge) by OperationName, bin(TimeGenerated, 1d)//, PartitionId
| order by TotalRequestCharge desc
//8. Was there an increase in payload size for write operations?
CDBDataPlaneRequests5M
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| where OperationName in ("Create", "Upsert", "Delete", "Execute")
| summarize sum(TotalRequestLength) by TimeGenerated, OperationName
| render timechart
//9. Was there an increase in response size for read operations?
CDBDataPlaneRequests5M
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| where OperationName in ("Read", "Query")
| summarize sum(TotalResponseLength) by TimeGenerated, OperationName
| render timechart
//10. Was there an increase in server-side timeouts?
CDBDataPlaneRequests5M
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| where StatusCode == 408
| summarize sum(SampleCount) by TimeGenerated
| render timechart
//11. Was the latency on a particular client or app?
CDBDataPlaneRequests5M
//| where TimeGenerated > now(-6h)
| where DatabaseName == "ContosoDemo" and CollectionName == "Transactions"
| summarize TotalDurationInMs=sum(TotalDurationMs) by UserAgent, ClientIpAddress, TimeGenerated
| render timechart