Azure SQL 数据库中可检测的查询性能瓶颈类型Detectable types of query performance bottlenecks in Azure SQL Database

适用于: Azure SQL 数据库 Azure SQL 托管实例

尝试解决某个性能瓶颈时,首先应判断该瓶颈是查询处于运行状态还是等待状态时发生的。When trying to resolve a performance bottleneck, start by determining whether the bottleneck is occurring while the query is in a running state or a waiting state. 根据这种判断运用不同的解决方法。Different resolutions apply depending upon this determination. 借助下图来了解可能导致运行相关问题或等待相关问题的因素。Use the following diagram to help understand the factors that can cause either a running-related problem or a waiting-related problem. 本文将探讨每种类型的问题及其相关解决方法。Problems and resolutions relating to each type of problem are discussed in this article.

可以使用 Azure SQL 数据库智能见解或 SQL Server DMV 来检测这些类型的性能瓶颈。You can use Azure SQL Database Intelligent Insights or SQL Server DMVs to detect these types of performance bottlenecks.

工作负荷状态

与运行相关的问题:与运行相关的问题通常与编译问题相关,这些编译问题导致查询计划欠佳,或导致与资源不足或过度使用相关的执行问题。Running-related problems: Running-related problems are generally related to compilation problems resulting in a suboptimal query plan or execution problems related to insufficient or overused resources. 与等待相关的问题:与等待相关的问题通常与以下因素有关:Waiting-related problems: Waiting-related problems are generally related to:

  • 锁(阻塞)Locks (blocking)
  • I/OI/O
  • 与 TempDB 相关的争用Contention related to TempDB usage
  • 内存授予等待Memory grant waits

导致查询计划欠佳的编译问题Compilation problems resulting in a suboptimal query plan

SQL 查询优化器生成的欠佳计划可能是查询性能缓慢的原因。A suboptimal plan generated by the SQL Query Optimizer may be the cause of slow query performance. 由于缺少索引、统计信息过时、要处理的行数估算不当,或者所需内存估算错误,SQL 查询优化器可能会生成欠佳的计划。The SQL Query Optimizer might produce a suboptimal plan because of a missing index, stale statistics, an incorrect estimate of the number of rows to be processed, or an inaccurate estimate of the required memory. 如果我们知道该查询在过去或者在其他实例上的执行速度较快,则可以比较实际的执行计划,以找到其差异。If you know the query was executed faster in the past or on another instance, compare the actual execution plans to see if they're different.

  • 使用以下方法之一识别任何缺失的索引:Identify any missing indexes using one of these methods:

    • 使用智能见解Use Intelligent Insights.
    • 适用于单一数据库和共用数据库的数据库顾问Database Advisor for single and pooled databases.
    • DMV。DMVs. 此示例演示了缺少索引的影响、如何使用 DMV 检测缺少的索引,以及实施有关缺少索引的建议所带来的影响。This example shows you the impact of a missing index, how to detect a missing indexes using DMVs, and the impact of implementing the missing index recommendation.
  • 尝试应用查询提示更新统计信息重新生成索引,以获得更好的计划。Try to apply query hints, update statistics, or rebuild indexes to get the better plan. 在 Azure SQL 数据库中启用自动计划更正,以自动缓解这些问题。Enable automatic plan correction in Azure SQL Database to automatically mitigate these problems.

    示例演示了参数化查询导致查询计划欠佳的影响、如何检测此状况,以及如何使用查询提示解决问题。This example shows the impact of a suboptimal query plan due to a parameterized query, how to detect this condition, and how to use a query hint to resolve.

  • 尝试更改数据库兼容性级别并实施智能查询处理。Try changing the database compatibility level and implementing intelligent query processing. SQL 查询优化器可能会生成不同的查询计划,具体取决于数据库的兼容性级别。The SQL Query Optimizer may generate a different query plan depending upon the compatibility level for your database. 更高的兼容性级别提供更多的智能查询处理功能Higher compatibility levels provide more intelligent query processing capabilities.

解析存在欠佳查询执行计划的查询Resolving queries with suboptimal query execution plans

以下部分介绍如何解决存在欠佳查询执行计划的查询。The following sections discuss how to resolve queries with suboptimal query execution plan.

存在参数敏感计划 (PSP) 问题的查询Queries that have parameter sensitive plan (PSP) problems

当查询优化器生成的查询执行计划仅适用于某个或某组特定的参数值,而缓存计划对于连续执行操作中所用的参数值并非最佳时,将发生参数敏感计划 (PSP) 问题。A parameter sensitive plan (PSP) problem happens when the query optimizer generates a query execution plan that's optimal only for a specific parameter value (or set of values) and the cached plan is then not optimal for parameter values that are used in consecutive executions. 这样,并非最佳的计划可能导致查询性能问题,并降低整体工作负荷吞吐量。Plans that aren't optimal can then cause query performance problems and degrade overall workload throughput.

有关参数探查和查询处理的详细信息,请参阅查询处理体系结构指南For more information on parameter sniffing and query processing, see the Query-processing architecture guide.

有多种解决方法可以缓解 PSP 问题。Several workarounds can mitigate PSP problems. 每种解决方法各有利弊:Each workaround has associated tradeoffs and drawbacks:

  • 在每次执行查询时使用 RECOMPILE 查询提示。Use the RECOMPILE query hint at each query execution. 此解决方法以编译时间和 CPU 增加为代价来换取更好的计划质量。This workaround trades compilation time and increased CPU for better plan quality. 对于需要高吞吐量的工作负荷,通常无法使用 RECOMPILE 选项。The RECOMPILE option is often not possible for workloads that require a high throughput.
  • 使用 OPTION (OPTIMIZE FOR…) 查询提示将实际参数值替代为典型的参数值,以便为大部分可能的参数值生成一个足够好的计划。Use the OPTION (OPTIMIZE FOR…) query hint to override the actual parameter value with a typical parameter value that produces a plan that's good enough for most parameter value possibilities. 此选项要求充分了解最佳参数值和相关的计划特征。This option requires a good understanding of optimal parameter values and associated plan characteristics.
  • 使用 OPTION (OPTIMIZE FOR UNKNOWN) 查询提示替代实际参数值,并改用密度向量平均值。Use the OPTION (OPTIMIZE FOR UNKNOWN) query hint to override the actual parameter value and instead use the density vector average. 还可以将传入的参数值捕获到局部变量中,然后在谓词内使用局部变量,而不是使用参数本身。You can also do this by capturing the incoming parameter values in local variables and then using the local variables within the predicates instead of using the parameters themselves. 对于此修复方法,平均密度必须足够好。For this fix, the average density must be good enough.
  • 使用 DISABLE_PARAMETER_SNIFFING 查询提示完全禁用参数探查。Disable parameter sniffing entirely by using the DISABLE_PARAMETER_SNIFFING query hint.
  • 使用 KEEPFIXEDPLAN 查询提示防止在缓存中重新编译。Use the KEEPFIXEDPLAN query hint to prevent recompilations in cache. 此解决方法假定缓存中已有的通用计划已经足够好。This workaround assumes that the good-enough common plan is the one in cache already. 还可以禁用统计信息自动更新,以减少逐出良好计划而编译新的不良计划的可能性。You can also disable automatic statistics updates to reduce the chances that the good plan will be evicted and a new bad plan will be compiled.
  • 显式使用 USE PLAN 查询提示,通过重写查询并在查询文本中添加提示来强制执行计划。Force the plan by explicitly using the USE PLAN query hint by rewriting the query and adding the hint in the query text. 或者,使用查询存储或启用自动优化来设置特定的计划。Or set a specific plan by using Query Store or by enabling automatic tuning.
  • 将单个过程替换为一组嵌套的过程,可以根据条件逻辑和关联的参数值来使用其中每个过程。Replace the single procedure with a nested set of procedures that can each be used based on conditional logic and the associated parameter values.
  • 创建动态字符串执行来替代静态过程定义。Create dynamic string execution alternatives to a static procedure definition.

有关解决 PSP 问题的详细信息,请参阅以下博客文章:For more information about resolving PSP problems, see these blog posts:

参数化不当而导致的编译活动Compile activity caused by improper parameterization

当查询包含文本时,为了减少编译次数,要么数据库引擎选择自动参数化语​​句,要么用户显式参数化语句。When a query has literals, either the database engine automatically parameterizes the statement or a user explicitly parameterizes the statement to reduce the number of compilations. 使用相同模式但不同文本值的大量查询编译可能会导致 CPU 使用率较高。A high number of compilations for a query using the same pattern but different literal values can result in high CPU usage. 同样,如果仅将查询部分参数化,导致该查询仍包含文本,则数据库引擎不会将查询进一步参数化。Similarly, if you only partially parameterize a query that continues to have literals, the database engine doesn't parameterize the query further.

以下是部分参数化查询的示例:Here's an example of a partially parameterized query:

SELECT *
FROM t1 JOIN t2 ON t1.c1 = t2.c1
WHERE t1.c1 = @p1 AND t2.c2 = '961C3970-0E54-4E8E-82B6-5545BE897F8F'

在此示例中,t1.c1 采用 @p1,而 t2.c2 继续采用文本形式的 GUID。In this example, t1.c1 takes @p1, but t2.c2 continues to take GUID as literal. 在这种情况下,如果更改 c2 的值,该查询将被视为不同的查询,并且将进行新的编译。In this case, if you change the value for c2, the query is treated as a different query, and a new compilation will happen. 若要减少此示例中的编译次数,也要参数化 GUID。To reduce compilations in this example, you would also parameterize the GUID.

以下查询显示查询哈希的查询计数,以确定查询是否已正确参数化:The following query shows the count of queries by query hash to determine whether a query is properly parameterized:

SELECT TOP 10
  q.query_hash
  , count (distinct p.query_id ) AS number_of_distinct_query_ids
  , min(qt.query_sql_text) AS sampled_query_text
FROM sys.query_store_query_text AS qt
  JOIN sys.query_store_query AS q
     ON qt.query_text_id = q.query_text_id
  JOIN sys.query_store_plan AS p
     ON q.query_id = p.query_id
  JOIN sys.query_store_runtime_stats AS rs
     ON rs.plan_id = p.plan_id
  JOIN sys.query_store_runtime_stats_interval AS rsi
     ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE
  rsi.start_time >= DATEADD(hour, -2, GETUTCDATE())
  AND query_parameterization_type_desc IN ('User', 'None')
GROUP BY q.query_hash
ORDER BY count (distinct p.query_id) DESC

影响查询计划更改的因素Factors that affect query plan changes

重新编译查询执行计划可能会导致生成的查询计划与最初缓存的计划不同。A query execution plan recompilation might result in a generated query plan that differs from the original cached plan. 有多种原因可能导致自动重新编译现有的原始计划:An existing original plan might be automatically recompiled for various reasons:

  • 查询引用的架构发生更改Changes in the schema are referenced by the query
  • 对查询引用的表进行数据更改Data changes to the tables are referenced by the query
  • 查询上下文选项已更改Query context options were changed

编译的计划可能会出于各种原因从缓存中逐出,例如:A compiled plan might be ejected from the cache for various reasons, such as:

  • 实例重启Instance restarts
  • 数据库范围的配置更改Database-scoped configuration changes
  • 内存压力Memory pressure
  • 清除缓存的显式请求Explicit requests to clear the cache

如果使用 RECOMPILE 提示,则不会缓存计划。If you use a RECOMPILE hint, a plan won't be cached.

重新编译(或者在缓存逐出后重新编译)仍可能导致从生成与原始计划相同的查询执行计划。A recompilation (or fresh compilation after cache eviction) can still result in the generation of a query execution plan that's identical to the original. 如果计划与以前的或原始的计划不同,可能的解释如下:When the plan changes from the prior or original plan, these explanations are likely:

  • 更改了物理设计:例如,新建的索引可以更有效地解决查询要求。Changed physical design: For example, newly created indexes more effectively cover the requirements of a query. 如果查询优化器认为利用这些新索引比最初选择用于第一个版本的查询执行的数据结构更有利,则可以在新的编译中使用这些新索引。The new indexes might be used on a new compilation if the query optimizer decides that using that new index is more optimal than using the data structure that was originally selected for the first version of the query execution. 对引用的对象进行任何物理更改可能会导致在编译时生成新的计划选项。Any physical changes to the referenced objects might result in a new plan choice at compile time.

  • 服务器资源差异:如果一个系统中的计划不同于另一个系统中的计划,则资源可用性(例如可用的处理器数)可能会影响生成的计划。Server resource differences: When a plan in one system differs from the plan in another system, resource availability, such as the number of available processors, can influence which plan gets generated. 例如,如果一个系统的处理器数较多,则可能会选择并行计划。For example, if one system has more processors, a parallel plan might be chosen.

  • 不同的统计信息:与引用的对象关联的统计信息发生了更改,或者与原始系统的统计信息有本质的差别。Different statistics: The statistics associated with the referenced objects might have changed or might be materially different from the original system's statistics. 如果统计信息发生更改并重新编译,则查询优化器将使用从更改时间开始的统计信息。If the statistics change and a recompilation happens, the query optimizer uses the statistics starting from when they changed. 修改后的统计信息的数据分布和频率可能不同于原始编译中的情况。The revised statistics' data distributions and frequencies might differ from those of the original compilation. 这些更改用于创建基数估算。These changes are used to create cardinality estimates. (基数估算是预计要流经逻辑查询树的行数。)更改基数估算值可以引导我们选择不同的物理运算符和关联的操作顺序。(Cardinality estimates are the number of rows that are expected to flow through the logical query tree.) Changes to cardinality estimates might lead you to choose different physical operators and associated orders of operations. 即使对统计信息进行少量的更改,也可能会导致查询执行计划发生变化。Even minor changes to statistics can result in a changed query execution plan.

  • 更改了数据库兼容性级别或基数估算器版本:更改数据库兼容性级别可以启用新的策略和功能,从而可能导致生成不同的查询执行计划。Changed database compatibility level or cardinality estimator version: Changes to the database compatibility level can enable new strategies and features that might result in a different query execution plan. 除了数据库兼容性级别以外,禁用或启用跟踪标志 4199 或更改数据库范围的配置 QUERY_OPTIMIZER_HOTFIXES 的状态也可能会影响编译时的查询执行计划选项。Beyond the database compatibility level, a disabled or enabled trace flag 4199 or a changed state of the database-scoped configuration QUERY_OPTIMIZER_HOTFIXES can also influence query execution plan choices at compile time. 跟踪标志 9481(强制旧式 CE)和 2312(强制默认 CE)也会影响计划。Trace flags 9481 (force legacy CE) and 2312 (force default CE) also affect the plan.

资源限制问题Resource limits issues

与欠佳查询计划和缺少索引无关的查询性能缓慢问题通常与资源不足或过度使用相关。Slow query performance not related to suboptimal query plans and missing indexes are generally related to insufficient or overused resources. 如果查询计划是最佳的,则原因可能是查询(和数据库)即将达到数据库、弹性池或托管实例的资源限制。If the query plan is optimal, the query (and the database) might be hitting the resource limits for the database, elastic pool, or managed instance. 例如,超过了服务级别的日志写入吞吐量限制。An example might be excess log write throughput for the service level.

如果确定问题与资源不足有关,可以升级资源来增大数据库的容量,以满足 CPU 要求。If you identify the problem as insufficient resource, you can upgrade resources to increase the capacity of your database to absorb the CPU requirements. 有关详细信息,请参阅在 Azure SQL 数据库中缩放单一数据库资源在 Azure SQL 数据库中缩放弹性池资源For more information, see Scale single database resources in Azure SQL Database and Scale elastic pool resources in Azure SQL Database. 有关缩放托管实例的信息,请参阅服务层级资源限制For information about scaling a managed instance, see Service-tier resource limits

工作负荷量增加导致的性能问题Performance problems caused by increased workload volume

应用程序流量和工作负荷量的增加可能会导致 CPU 使用率增大。An increase in application traffic and workload volume can cause increased CPU usage. 但是,要想正确诊断此问题,必须慎之又慎。But you must be careful to properly diagnose this problem. 如果出现 CPU 使用率偏高的情况,请回答以下问题来确定 CPU 使用率增大是否由工作负荷量变化而引起:When you see a high-CPU problem, answer these questions to determine whether the increase is caused by changes to the workload volume:

  • 来自应用程序的查询是否是导致高 CPU 问题的原因?Are the queries from the application the cause of the high-CPU problem?

  • 对于可以识别的 CPU 消耗量靠前的查询For the top CPU-consuming queries that you can identify:

    • 同一查询是否关联了多个执行计划?Were multiple execution plans associated with the same query? 如果是,为什么?If so, why?
    • 对于具有相同执行计划的查询,执行时间是否一致?For queries with the same execution plan, were the execution times consistent? 执行计数是否增加?Did the execution count increase? 如果是,则性能问题可能是由工作负荷增加造成。If so, the workload increase is likely causing performance problems.

总而言之,如果查询执行计划没有以不同的方式执行,CPU 使用率却随执行计数的增加而增加,则可能出现了与工作负荷增加相关的性能问题。In summary, if the query execution plan didn't execute differently but CPU usage increased along with execution count, the performance problem is likely related to a workload increase.

要得出是工作负荷量变化导致 CPU 问题的结论并不容易。It's not always easy to identify a workload volume change that's driving a CPU problem. 请考虑以下因素:Consider these factors:

  • 资源使用率发生了变化:以 CPU 使用率长时间增加到 80% 的情况为例。Changed resource usage: For example, consider a scenario where CPU usage increased to 80 percent for an extended period of time. 单凭 CPU 使用率并不能说明工作负荷量发生了变化。CPU usage alone doesn't mean the workload volume changed. 即使应用程序执行的工作负荷完全相同,查询执行计划的回归和数据分布的变化也有可能导致资源使用率的增加。Regressions in the query execution plan and changes in data distribution can also contribute to more resource usage even though the application executes the same workload.

  • 出现了新查询:应用程序有可能在不同时间发出一组新的查询。The appearance of a new query: An application might drive a new set of queries at different times.

  • 请求数量增加或减少:这种情况最能说明工作负荷量的变化。An increase or decrease in the number of requests: This scenario is the most obvious measure of a workload. 查询数量并不一定与资源利用率增加相对应。The number of queries doesn't always correspond to more resource utilization. 但是,在其他因素不变的前提下,该指标仍然是一个重要信号。However, this metric is still a significant signal, assuming other factors are unchanged.

使用智能见解检测工作负荷增大计划回归Use Intelligent Insights to detect workload increases and plan regressions.

消除与执行问题相关的欠佳计划和等待相关问题后,性能问题通常与查询可能正在等待某个资源有关。Once you have eliminated a suboptimal plan and Waiting-related problems that are related to execution problems, the performance problem is generally the queries are probably waiting for some resource. 等待相关的问题的可能原因如下:Waiting-related problems might be caused by:

  • 阻塞Blocking:

    一个查询可能持有数据库中某些对象的锁,而其他查询正在尝试访问相同的对象。One query might hold the lock on objects in the database while others try to access the same objects. 可以使用 DMV智能见解来识别阻塞的查询。You can identify blocking queries by using DMVs or Intelligent Insights.

  • IO 问题IO problems

    查询可能正在等待将页面写入数据文件或日志文件。Queries might be waiting for the pages to be written to the data or log files. 在这种情况下,请检查 DMV 中的 INSTANCE_LOG_RATE_GOVERNORWRITE_LOGPAGEIOLATCH_* 等待统计信息。In this case, check the INSTANCE_LOG_RATE_GOVERNOR, WRITE_LOG, or PAGEIOLATCH_* wait statistics in the DMV. 了解如何使用 DMV 识别 IO 性能问题See using DMVs to identify IO performance issues.

  • TempDB 问题TempDB problems

    如果工作负荷使用大量的临时表,或者计划中存在大量的 TempDB 溢出,则可能表示查询遇到了 TempDB 吞吐量问题。If the workload uses temporary tables or there are TempDB spills in the plans, the queries might have a problem with TempDB throughput. 了解如何使用 DMV 识别 TempDB 问题See using DMVs to identity TempDB issues.

  • 内存相关的问题Memory-related problems

    如果工作负荷没有足够的内存,页生存期可能会下降,或者查询获得的内存量小于所需的量。If the workload doesn't have enough memory, the page life expectancy might drop, or the queries might get less memory than they need. 在某些情况下,查询优化器中的内置智能会解决内存相关的问题。In some cases, built-in intelligence in Query Optimizer will fix memory-related problems. 了解如何使用 DMV 识别内存授予问题See using DMVs to identify memory grant issues.

用于显示最常见等待类别的方法Methods to show top wait categories

用于显示最常见等待类别的最常用方法如下:These methods are commonly used to show the top categories of wait types:

  • 使用智能见解识别由于等待时间增加而出现性能降低的查询Use Intelligent Insights to identify queries with performance degradation due to increased waits
  • 使用查询存储查找每个查询在不同时间段的等待统计信息。Use Query Store to find wait statistics for each query over time. 在查询存储中,等待类型合并成等待类别。In Query Store, wait types are combined into wait categories. 可在 sys.query_store_wait_stats 中找到等待类别到等待类型的映射。You can find the mapping of wait categories to wait types in sys.query_store_wait_stats.
  • 使用 sys.dm_db_wait_stats 返回有关查询操作期间执行的线程遇到的所有等待的信息。Use sys.dm_db_wait_stats to return information about all the waits encountered by threads that executed during a query operation. 可以使用此聚合视图来诊断 Azure SQL 数据库以及特定查询和批的性能问题。You can use this aggregated view to diagnose performance problems with Azure SQL Database and also with specific queries and batches. 查询可能正在等待资源,发生了队列等待或外部等待。Queries can be waiting on resources, queue waits, or external waits.
  • 使用 sys.dm_os_waiting_tasks 返回有关正在等待某个资源的任务队列的信息。Use sys.dm_os_waiting_tasks to return information about the queue of tasks that are waiting on some resource.

在 CPU 使用率偏高时,如果存在以下情况,查询存储和等待统计信息可能不会反映 CPU 使用率:In high-CPU scenarios, Query Store and wait statistics might not reflect CPU usage if:

  • CPU 消耗量较高的查询仍在执行。High-CPU-consuming queries are still executing.
  • 发生故障转移时,正在运行 CPU 消耗量较高的查询。The high-CPU-consuming queries were running when a failover happened.

跟踪查询存储和等待统计信息的 DMV 仅显示成功完成的查询和超时查询的结果。DMVs that track Query Store and wait statistics show results for only successfully completed and timed-out queries. 它们不显示当前正在执行的语句的数据(直到其完成)。They don't show data for currently executing statements until the statements finish. 使用动态管理视图 sys.dm_exec_requests 跟踪当前正在执行的查询以及相关的工作线程时间。Use the dynamic management view sys.dm_exec_requests to track currently executing queries and the associated worker time.

后续步骤Next steps

SQL 数据库监视和优化概述SQL Database monitoring and tuning overview