智能见解:使用 AI 监视数据库性能并对其进行故障排除(预览)Intelligent Insights using AI to monitor and troubleshoot database performance (preview)

适用于:是 Azure SQL 数据库 是Azure SQL 托管实例 APPLIES TO: yesAzure SQL Database yesAzure SQL Managed Instance

借助 Azure SQL 数据库和 Azure SQL 托管实例中的智能见解,可了解数据库性能的情况。Intelligent Insights in Azure SQL Database and Azure SQL Managed Instance lets you know what is happening with your database performance.

智能见解使用内置智能,通过人工智能持续监视数据库使用情况,并检测导致性能不佳的干扰性事件。Intelligent Insights uses built-in intelligence to continuously monitor database usage through artificial intelligence and detect disruptive events that cause poor performance. 检测到问题后,将执行详细的分析,以生成智能见解资源日志(称为 SQLInsights)和问题的智能评估。Once detected, a detailed analysis is performed that generates an Intelligent Insights resource log (called SQLInsights) with an intelligent assessment of the issue. 此评估包含对数据库性能问题的根本原因分析,以及为性能改进而提供的可行性建议。This assessment consists of a root cause analysis of the database performance issue and, where possible, recommendations for performance improvements.

智能见解能为你做什么?What can Intelligent Insights do for you

智能见解是 Azure 内置智能的一项独特功能,提供以下功能价值:Intelligent Insights is a unique capability of Azure built-in intelligence that provides the following value:

  • 主动监视Proactive monitoring
  • 定制的性能见解Tailored performance insights
  • 数据库性能下降的早期检测Early detection of database performance degradation
  • 已检测出问题的根本原因分析Root cause analysis of issues detected
  • 性能改进建议Performance improvement recommendations
  • 数以十万计的数据库上的横向扩展功能Scale out capability on hundreds of thousands of databases
  • 对 DevOps 资源和总拥有成本的积极影响Positive impact to DevOps resources and the total cost of ownership

智能见解的工作原理How does Intelligent Insights work

智能见解可分析数据库性能,方法是比较前一个小时的数据库工作负荷和前七天的基线工作负荷。Intelligent Insights analyzes database performance by comparing the database workload from the last hour with the past seven-day baseline workload. 数据库工作负荷由确定为对数据库性能最为重要的查询(例如重复最多和最大的查询)组成。Database workload is composed of queries determined to be the most significant to the database performance, such as the most repeated and largest queries. 由于每个数据库基于其结构、数据、使用情况和应用程序都是唯一的,因此每个生成的工作负荷基线对于该工作负荷而言都具有特定性和唯一性。Because each database is unique based on its structure, data, usage, and application, each workload baseline that is generated is specific and unique to that workload. 由于独立于工作负载基线,智能见解还可监视绝对操作阈值,并检测过长的等待时间问题、关键异常和查询参数化问题,这些问题可能会对性能造成影响。Intelligent Insights, independent of the workload baseline, also monitors absolute operational thresholds and detects issues with excessive wait times, critical exceptions, and issues with query parameterizations that might affect performance.

在使用人工智能根据多个观察的指标检测到性能降低问题后,将执行分析。After a performance degradation issue is detected from multiple observed metrics by using artificial intelligence, analysis is performed. 此外还会生成包含数据库状况的智能见解的诊断日志。A diagnostics log is generated with an intelligent insight on what is happening with your database. 智能见解可对数据库性能问题从其首次出现到解决全程轻松展开跟踪。Intelligent Insights makes it easy to track the database performance issue from its first appearance until resolution. 从初始问题检测和性能改进验证到完成,跟踪每个检测到的问题的整个生命周期。Each detected issue is tracked through its lifecycle from initial issue detection and verification of performance improvement to its completion.


用于衡量和检测数据库性能问题的指标基于查询持续时间、超时请求、过长的等待时间和出错的请求制定。The metrics used to measure and detect database performance issues are based on query duration, timeout requests, excessive wait times, and errored requests. 有关指标的详细信息,请参阅检测指标For more information on metrics, see Detection metrics.

确定的数据库性能下降问题记录在 SQLInsights 日志中,包含由以下属性组成的智能项:Identified database performance degradations are recorded in the SQLInsights log with intelligent entries that consist of the following properties:

属性Property 详细信息Details
数据库信息Database information 关于在其上检测到见解的数据库的元数据,例如资源 URI。Metadata about a database on which an insight was detected, such as a resource URI.
观察的时间范围Observed time range 检测到的见解时段的开始和结束时间。Start and end time for the period of the detected insight.
受影响的指标Impacted metrics 导致生成某个见解的指标:Metrics that caused an insight to be generated:
  • 查询持续时间增加 [秒]。Query duration increase [seconds].
  • 过长等待 [秒]。Excessive waiting [seconds].
  • 超时的请求 [百分比]。Timed-out requests [percentage].
  • 出错的请求 [百分比]。Errored-out requests [percentage].
影响值Impact value 某个指标测量出的值。Value of a metric measured.
受影响的查询和错误代码Impacted queries and error codes 查询哈希或错误代码。Query hash or error code. 这些属性可用于轻松关联到受影响的查询。These can be used to easily correlate to affected queries. 提供包括查询持续时间增加、等待时间、超时计数或错误代码的指标。Metrics that consist of either query duration increase, waiting time, timeout counts, or error codes are provided.
检测Detections 在数据库中发生事件时确定的检测。Detection identified at the database during the time of an event. 有 15 种检测模式。There are 15 detection patterns. 有关详细信息,请参阅使用智能见解排查数据库性能问题For more information, see Troubleshoot database performance issues with Intelligent Insights.
根本原因分析Root cause analysis 对已识别问题的根本原因分析采用人类可读的格式。Root cause analysis of the issue identified in a human-readable format. 一些见解可能包含可行的性能改进建议。Some insights might contain a performance improvement recommendation where possible.

智能见解在发现和排查数据库性能问题方面出类拔萃。Intelligent Insights shines in discovering and troubleshooting database performance issues. 若要使用智能见解排查数据库性能问题,请参阅使用智能见解解决性能问题In order to use Intelligent Insights to troubleshoot database performance issues, see Troubleshoot performance issues with Intelligent Insights.

智能见解选项Intelligent Insights options

可用的智能见解选项包括:Intelligent Insights options available are:

智能见解选项Intelligent Insights option Azure SQL 数据库支持Azure SQL Database support Azure SQL 托管实例支持Azure SQL Managed Instance support
配置智能见解 - 配置针对数据库的智能见解分析。Configure Intelligent Insights - Configure Intelligent Insights analysis for your databases. Yes Yes
将见解流式传输到 Azure 事件中心 - 将见解流式传输到事件中心,以便进一步进行自定义集成。Stream insights to Azure Event Hubs - Stream insights to Event Hubs for further custom integrations. Yes Yes
将见解流式传输到 Azure 存储 - 将见解流式传输到 Azure 存储,以便进一步进行分析和长期存档。Stream insights to Azure Storage - Stream insights to Azure Storage for further analysis and long-term archival. Yes Yes

配置智能见解日志的导出Configure the export of the Intelligent Insights log

可以将智能见解的输出流式传输到多个目标之一进行分析:Output of the Intelligent Insights can be streamed to one of several destinations for analysis:

  • 使用流式传输到 Azure 事件中心的输出,可以开发自定义监视和警报方案Output streamed to Azure Event Hubs can be used for development of custom monitoring and alerting scenarios
  • 使用流式传输到 Azure 存储的输出,可以进行自定义应用程序开发,例如自定义报告、长期数据存档,等等。Output streamed to Azure Storage can be used for custom application development, such are for example custom reporting, long-term data archival and so forth.

与 Azure SQL Analytics、Azure 事件中心、Azure 存储或第三方消费产品的集成方式是:先在数据库的“诊断设置”边栏选项卡中启用智能见解日志记录(“SQLInsights”日志),然后配置要流式传输到这些目标之一的智能见解日志数据。Integration of Azure SQL Analytics, Azure Event Hubs, Azure Storage, or third-party products for consumption is performed through first enabling Intelligent Insights logging (the "SQLInsights" log) in the Diagnostic settings blade of a database, and then configuring Intelligent Insights log data to be streamed into one of these destinations.

要详细了解如何启用智能见解日志记录和配置要流式传输到消费产品的指标及资源日志数据,请参阅指标和诊断日志记录For more information on how to enable Intelligent Insights logging and to configure metric and resource log data to be streamed to a consuming product, see Metrics and diagnostics logging.

通过事件中心进行设置Set up with Event Hubs

若要将智能见解与事件中心配合使用,配置要流式传输到事件中心的智能见解日志数据,请参阅指标和诊断日志记录将 Azure 诊断日志流式传输到事件中心To use Intelligent Insights with Event Hubs, configure Intelligent Insights log data to be streamed to Event Hubs, see Metrics and diagnostics logging and Stream Azure diagnostics logs to Event Hubs.

若要使用事件中心设置自定义监视和警报,请参阅如何在事件中心处理指标和诊断日志To use Event Hubs to set up custom monitoring and alerting, see What to do with metrics and diagnostics logs in Event Hubs.

通过 Azure 存储进行设置Set up with Azure Storage

若要将智能见解与存储配合使用,配置要流式传输到存储的智能见解日志数据,请参阅指标和诊断日志记录流式传输到 Azure 存储To use Intelligent Insights with Storage, configure Intelligent Insights log data to be streamed to Storage, see Metrics and diagnostics logging and Stream into Azure Storage.

智能见解日志的自定义集成Custom integrations of Intelligent Insights log

若要将智能见解与第三方工具配合使用或用于自定义警报和监视开发,请参阅使用智能见解数据库性能诊断日志To use Intelligent Insights with third-party tools, or for custom alerting and monitoring development, see Use the Intelligent Insights database performance diagnostics log.

检测指标Detection metrics

用于生成智能见解的检测模型的指标基于监视以下内容得出:Metrics used for detection models that generate Intelligent Insights are based on monitoring:

  • 查询持续时间Query duration
  • 超时请求Timeout requests
  • 过长的等待时间Excessive wait time
  • 出错的请求Errored out requests

在检测数据库工作负载性能问题时,查询持续时间和超时请求将用作主要模型。Query duration and timeout requests are used as primary models in detecting issues with database workload performance. 这是因为它们直接测量工作负荷发生的情况。They're used because they directly measure what is happening with the workload. 为检测工作负荷性能降低的所有可能情况,过长的等待时间和出错的请求将用作附加模型,以指出影响工作负荷性能的问题。To detect all possible cases of workload performance degradation, excessive wait time and errored-out requests are used as additional models to indicate issues that affect the workload performance.

系统会自动将工作负荷更改和对数据库所做的查询请求数更改纳入考虑,以动态决定正常和异常的数据库性能阈值。The system automatically considers changes to the workload and changes in the number of query requests made to the database to dynamically determine normal and out-of-the-ordinary database performance thresholds.

所有指标都将通过科学派生的数据模型在各种关系中纳入考虑,并且这种模型会对每个检测到的性能问题进行归类。All of the metrics are considered together in various relationships through a scientifically derived data model that categorizes each performance issue detected. 通过智能见解提供的信息包括:Information provided through an intelligent insight includes:

  • 检测到的性能问题的详细信息。Details of the performance issue detected.
  • 检测到的问题的根本原因分析。A root cause analysis of the issue detected.
  • 关于在可能的情况下如何提升受监视数据库的性能的建议。Recommendations on how to improve the performance of the monitored database, where possible.

查询持续时间Query duration

查询持续时间下降模型分析单独的查询,并检测与性能基线相比,编译和执行查询增加的时间。The query duration degradation model analyzes individual queries and detects the increase in the time it takes to compile and execute a query compared to the performance baseline.

如果内置智能检测到查询编译或查询执行时间显著增加并影响工作负载性能,这些查询将被标记为出现查询持续时间性能降低问题。If built-in intelligence detects a significant increase in query compile or query execution time that affects workload performance, these queries are flagged as query duration performance degradation issues.

智能见解诊断日志会输出性能降低的查询的查询哈希。The Intelligent Insights diagnostics log outputs the query hash of the query degraded in performance. 查询哈希指出性能降低是否与查询编译或执行时间增加(使查询持续时间增加)有关。The query hash indicates whether the performance degradation was related to query compile or execution time increase, which increased query duration time.

超时请求Timeout requests

超时请求下降模型分析单独的查询,并在查询执行级别检测任何超时增加情况,以及与性能基线持续时间相比,数据库级别的总体请求超时。The timeout requests degradation model analyzes individual queries and detects any increase in timeouts at the query execution level and the overall request timeouts at the database level compared to the performance baseline period.

某些查询甚至在到达执行阶段之前就可能超时。Some of the queries might time out even before they reach the execution stage. 通过比较中止的辅助角色数和所提请求数的方式,内置智能还测量和分析到达数据库的所有查询,而不管它们有没有到达执行阶段。Through the means of aborted workers vs. requests made, built-in intelligence measures and analyzes all queries that reached the database whether they got to the execution stage or not.

在执行查询超时数或中止的请求辅助角色数超出系统管理阈值后,使用智能见解填充诊断日志。After the number of timeouts for executed queries or the number of aborted request workers crosses the system-managed threshold, a diagnostics log is populated with intelligent insights.

生成的见解包含超时请求数和超时查询数。The insights generated contain the number of timed-out requests and the number of timed-out queries. 指示性能降低是否与执行阶段的超时增加有关,或提供总体数据库级别。Indication of the performance degradation is related to timeout increase at the execution stage, or the overall database level is provided. 当系统认为超时增加对数据库性能至关重要时,这些查询将标记为出现超时性能降低问题。When the increase in timeouts is deemed significant to database performance, these queries are flagged as timeout performance degradation issues.

过长的等待时间Excessive wait times

过长的等待时间模型监视单个数据库查询。The excessive wait time model monitors individual database queries. 它会检测高出系统管理的绝对阈值的异常查询等待统计信息。It detects unusually high query wait stats that crossed the system-managed absolute thresholds. 以下查询过长等待时间指标通过查询存储等待统计信息 (sys.query_store_wait_stats) 进行观察:The following query excessive wait-time metrics are observed by using, Query Store Wait Stats (sys.query_store_wait_stats):

  • 达到资源限制Reaching resource limits
  • 达到弹性池资源上限Reaching elastic pool resource limits
  • 过多的辅助角色或会话线程数Excessive number of worker or session threads
  • 过多的数据库锁定Excessive database locking
  • 内存压力Memory pressure
  • 其他等待统计信息Other wait stats

达到资源限制或弹性池资源限制表示某一订阅或弹性池中的可用资源消耗已超出绝对阈值。Reaching resource limits or elastic pool resource limits denote that consumption of available resources on a subscription or in the elastic pool crossed absolute thresholds. 这些统计信息指示工作负荷性能降低。These stats indicate workload performance degradation. 过多的辅助角色或会话线程数表示如下情况:启动的工作线程数或会话数超出绝对阈值。An excessive number of worker or session threads denotes a condition in which the number of worker threads or sessions initiated crossed absolute thresholds. 这些统计信息指示工作负荷性能降低。These stats indicate workload performance degradation.

过多的数据库锁定表示如下情况:数据库锁定计数超出绝对阈值。Excessive database locking denotes a condition in which the count of locks on a database has crossed absolute thresholds. 此状态指示工作负荷性能降低。This stat indicates a workload performance degradation. 内存压力表示如下情况:请求内存授予的线程数超出绝对阈值。Memory pressure is a condition in which the number of threads requesting memory grants crossed an absolute threshold. 此状态指示工作负荷性能降低。This stat indicates a workload performance degradation.

其他等待统计信息指示如下情况:通过查询存储等待统计信息测量的其他指标超出绝对阈值。Other wait stats detection indicates a condition in which miscellaneous metrics measured through the Query Store Wait Stats crossed an absolute threshold. 这些统计信息指示工作负荷性能降低。These stats indicate workload performance degradation.

检测到过长的等待时间后,根据可用数据的情况,智能见解诊断日志将输出性能降低正在影响和已经影响的查询的哈希,以及导致查询在执行和测量等待时间中等待的指标的详细信息。After excessive wait times are detected, depending on the data available, the Intelligent Insights diagnostics log outputs hashes of the affecting and affected queries degraded in performance, details of the metrics that cause queries to wait in execution, and measured wait time.

出错的请求Errored requests

出错的请求下降模型监视单独的查询,并检测与基线持续时间相比,增加的出错查询数。The errored requests degradation model monitors individual queries and detects an increase in the number of queries that errored out compared to the baseline period. 此模型还监视超出内置智能管理的绝对阈值的关键异常。This model also monitors critical exceptions that crossed absolute thresholds managed by built-in intelligence. 系统会自动考虑向数据库提出的查询请求数,并在受监视时期说明任何工作负荷更改。The system automatically considers the number of query requests made to the database and accounts for any workload changes in the monitored period.

当系统认为测量出的出错请求数增加(与提出的总请求数有关)对工作负荷性能至关重要时,受影响的查询将标记为具有出错请求的性能降低问题。When the measured increase in errored requests relative to the overall number of requests made is deemed significant to workload performance, affected queries are flagged as errored requests performance degradation issues.

智能见解日志输出出错请求计数。The Intelligent Insights log outputs the count of errored requests. 它会指示性能降低是否与出错请求数增加或超出受监视的关键异常阈值有关,以及性能降低的测量时间。It indicates whether the performance degradation was related to an increase in errored requests or to crossing a monitored critical exception threshold and measured time of the performance degradation.

在任何受监视的关键异常超出系统管理的绝对阈值的情况下,会使用关键异常详细信息生成智能见解。If any of the monitored critical exceptions cross the absolute thresholds managed by the system, an intelligent insight is generated with critical exception details.

后续步骤Next steps