监视 Azure Synapse Analytics 中的资源利用率和查询活动Monitoring resource utilization and query activity in Azure Synapse Analytics

Azure Synapse Analytics 在 Azure 门户中提供了丰富的监视体验,可以显示与数据仓库工作负载相关的见解。Azure Synapse Analytics provides a rich monitoring experience within the Azure portal to surface insights regarding your data warehouse workload. 建议使用 Azure 门户来监视数据仓库,因为它提供可配置的保持期、警报、建议,并为指标和日志提供可自定义的图表与仪表板。The Azure portal is the recommended tool when monitoring your data warehouse as it provides configurable retention periods, alerts, recommendations, and customizable charts and dashboards for metrics and logs. 在门户中,还可与具有日志分析功能的 Azure Monitor(日志)等其他 Azure 监视服务集成,这样,不仅可以针对数据仓库,而且还能为整个 Azure 分析平台提供全面的集成式监视体验。The portal also enables you to integrate with other Azure monitoring services such as Azure Monitor (logs) with Log analytics to provide a holistic monitoring experience for not only your data warehouse but also your entire Azure analytics platform for an integrated monitoring experience. 本文档介绍了可在 Synapse SQL 中使用哪些监视功能来优化和管理分析平台。This documentation describes what monitoring capabilities are available to optimize and manage your analytics platform with Synapse SQL.

资源利用率Resource utilization

Azure 门户中提供了以下可用于 Synapse SQL 的指标。The following metrics are available in the Azure portal for Synapse SQL. 这些指标通过 Azure Monitor 显示。These metrics are surfaced through Azure Monitor.

指标名称Metric Name 说明Description 聚合类型Aggregation Type
CPU 百分比CPU percentage 数据仓库所有节点的 CPU 利用率CPU utilization across all nodes for the data warehouse Avg、Min、MaxAvg, Min, Max
数据 IO 百分比Data IO percentage 数据仓库所有节点的 IO 利用率IO Utilization across all nodes for the data warehouse Avg、Min、MaxAvg, Min, Max
内存百分比Memory percentage 数据仓库所有节点的内存利用率 (SQL Server)Memory utilization (SQL Server) across all nodes for the data warehouse Avg、Min、MaxAvg, Min, Max
活动查询数Active Queries 正在系统上执行的活动查询数Number of active queries executing on the system SumSum
排队的查询数Queued Queries 等待开始执行的排队查询数Number of queued queries waiting to start executing SumSum
成功的连接数Successful Connections 针对数据库的成功连接数(登录数)Number of successful connections (logins) against the database 总和、计数Sum, Count
失败的连接数Failed Connections 针对数据库的失败连接数(登录数)Number of failed connections (logins) against the database 总和、计数Sum, Count
被防火墙阻止Blocked by Firewall 登录数据仓库受阻次数Number of logins to the data warehouse which was blocked Sum、CountSum, Count
DWU 限制DWU limit 数据仓库的服务级别目标Service level objective of the data warehouse Avg、Min、MaxAvg, Min, Max
DWU 百分比DWU percentage CPU 百分比与数据 IO 百分比之间的最大值Maximum between CPU percentage and Data IO percentage Avg、Min、MaxAvg, Min, Max
已用的 DWUDWU used DWU 限制 * DWU 百分比DWU limit * DWU percentage Avg、Min、MaxAvg, Min, Max
缓存命中百分比Cache hit percentage (缓存命中数/缓存未命中数) * 100,其中,缓存命中数是在本地 SSD 缓存中所有列存储段的总命中数,缓存未命中数是所有节点上本地 SSD 缓存中列存储段的未命中数之和(cache hits / cache miss) * 100 where cache hits is the sum of all columnstore segments hits in the local SSD cache and cache miss is the columnstore segments misses in the local SSD cache summed across all nodes Avg、Min、MaxAvg, Min, Max
缓存使用百分比Cache used percentage (已用缓存/缓存容量) * 100,其中,已用缓存是所有节点上的本地 SSD 缓存中所有字节之和,缓存容量是所有节点上的本地 SSD 缓存存储容量之和(cache used / cache capacity) * 100 where cache used is the sum of all bytes in the local SSD cache across all nodes and cache capacity is the sum of the storage capacity of the local SSD cache across all nodes Avg、Min、MaxAvg, Min, Max
本地 tempdb 百分比Local tempdb percentage 所有计算节点上的本地 tempdb 利用率 - 每五分钟发出一次值Local tempdb utilization across all compute nodes - values are emitted every five minutes 平均值、最小值、最大值Avg, Min, Max

查看指标和设置警报时的注意事项:Things to consider when viewing metrics and setting alerts:

  • 使用的 DWU 仅大概表示 SQL 池中的使用情况,不是一个综合的利用率指标。DWU used represents only a high-level representation of usage across the SQL pool and is not meant to be a comprehensive indicator of utilization. 若要确定是要纵向扩展还是纵向缩减,请考虑可能会受 DWU 影响的所有因素,例如并发性、内存、tempdb 和自适应缓存容量。To determine whether to scale up or down, consider all factors which can be impacted by DWU such as concurrency, memory, tempdb, and adaptive cache capacity. 建议以不同的 DWU 设置运行工作负载,以确定哪种设置最适合业务目标。We recommend running your workload at different DWU settings to determine what works best to meet your business objectives.
  • 为特定数据仓库(而不是服务器本身)报告失败和成功的连接。Failed and successful connections are reported for a particular data warehouse - not for the server itself.
  • 即使数据仓库处于空闲状态,内存百分比也会反映利用率,而不会反映活动工作负荷内存消耗。Memory percentage reflects utilization even if the data warehouse is in idle state - it does not reflect active workload memory consumption. 使用并跟踪这个指标以及其他指标(tempdb、gen2 缓存),以全面判断扩展额外的缓存容量是否会提高工作负荷性能以满足你的需求。Use and track this metric along with others (tempdb, gen2 cache) to make a holistic decision on if scaling for additional cache capacity will increase workload performance to meet your requirements.

查询活动Query activity

为了让用户通过 T-SQL 以编程方式监视 Synapse SQL,该服务提供了一系列动态管理视图 (DMV)。For a programmatic experience when monitoring Synapse SQL via T-SQL, the service provides a set of Dynamic Management Views (DMVs). 在主动排查和识别工作负荷的性能瓶颈时,这些视图非常有用。These views are useful when actively troubleshooting and identifying performance bottlenecks with your workload.

若要查看适用于 Synapse SQL 的 DMV 的列表,请参阅此文档To view the list of DMVs that apply to Synapse SQL, refer to this documentation.

指标和诊断日志记录Metrics and diagnostics logging

指标和日志都可导出到 Azure Monitor(确切地说,是 Azure Monitor 日志组件),并可通过日志查询以编程方式进行访问。Both metrics and logs can be exported to Azure Monitor, specifically the Azure Monitor logs component and can be programmatically accessed through log queries. Synapse SQL 的日志延迟约为 10-15 分钟。The log latency for Synapse SQL is about 10-15 minutes. 有关影响延迟的各种因素的详细信息,请访问以下文档。For more details on the factors impacting latency, visit the following documentation.

后续步骤Next steps

以下操作方法指南介绍了在监视和管理数据仓库时可以参考的常见方案和用例:The following How-to guide describes common scenarios and use cases when monitoring and managing your data warehouse: