Azure Synapse Analytics 工作负载重要性Azure Synapse Analytics workload importance

本文介绍了工作负载重要性如何影响 Azure Synapse 中 Synapse SQL 池请求的执行顺序。This article explains how workload importance can influence the order of execution for Synapse SQL pool requests in Azure Synapse.

重要性Importance

业务需求可能要求数据仓库工作负荷的重要性高于其他工作负荷。Business needs can require data warehousing workloads to be more important than others. 假设存在这样一种情况:需要加载财务周期结束之前的任务关键型销售数据。Consider a scenario where mission critical sales data is loaded before the fiscal period close. 其他信息源(例如天气数据)的数据加载不需要遵守严格的 SLA。Data loads for other sources such as weather data don't have strict SLAs. 为加载销售数据的请求设置高重要性,并为加载天气数据的请求设置低重要性,可以确保销售数据加载操作能够首先访问资源,并更快地完成。Setting high importance for a request to load sales data and low importance to a request to load weather data ensures the sales data load gets first access to resources and completes quicker.

重要性级别Importance levels

有五个重要性级别:low、below_normal、normal、above_normal 和 high。There are five levels of importance: low, below_normal, normal, above_normal, and high. 对于未设置重要性的请求,将为其分配默认级别 normal。Requests that don't set importance are assigned the default level of normal. 重要性级别相同的请求具有当前存在的相同计划行为。Requests that have the same importance level have the same scheduling behavior that exists today.

重要性场景Importance scenarios

除了上面所述的有关销售数据和天气数据的基本重要性场景以外,在其他某些场景中,工作负荷重要性也有助于满足数据处理和查询需求。Beyond the basic importance scenario described above with sales and weather data, there are other scenarios where workload importance helps meet data processing and querying needs.

锁定Locking

访问读取和写入活动锁是自然争用的一个方面。Access to locks for read and write activity is one area of natural contention. 分区切换RENAME OBJECT 等活动需要权限提升的锁。Activities such as partition switching or RENAME OBJECT require elevated locks. 如果没有工作负载重要性,Azure Synapse 中的 Synapse SQL 池会针对吞吐量进行优化。Without workload importance, Synapse SQL pool in Azure Synapse optimizes for throughput. 针对吞吐量进行优化意味着,当正在运行的和排队的请求具有相同的锁定需求,并且资源可用时,排队的请求可能会绕过提前抵达请求队列的、具有更高锁定需求的请求。Optimizing for throughput means that when running and queued requests have the same locking needs and resources are available, the queued requests can bypass requests with higher locking needs that arrived in the request queue earlier. 将工作负荷重要性应用到具有较高锁定需求的请求后,Once workload importance is applied to requests with higher locking needs. 会先运行具有较高重要性的请求,然后再运行具有较低重要性的请求。Request with higher importance will be run before request with lower importance.

请考虑以下示例:Consider the following example:

  • Q1 正在运行,它从 SalesFact 中选择数据。Q1 is actively running and selecting data from SalesFact.
  • Q2 正在排队等待 Q1 完成。Q2 is queued waiting for Q1 to complete. 该请求是在上午 9:00 提交的,目前正在尝试将新数据分区切换到 SalesFact。It was submitted at 9am and is attempting to partition switch new data into SalesFact.
  • Q3 是在上午 9:01 提交的,希望从 SalesFact 中选择数据。Q3 is submitted at 9:01am and wants to select data from SalesFact.

如果 Q2 和 Q3 的重要性相同,而 Q1 仍在执行,则 Q3 将开始执行。If Q2 and Q3 have the same importance and Q1 is still executing, Q3 will begin executing. Q2 将继续等待 SalesFact 上的独占锁。Q2 will continue to wait for an exclusive lock on SalesFact. 如果 Q2 的重要性高于 Q3,则 Q3 将等待 Q2 完成,然后它才能开始执行。If Q2 has higher importance than Q3, Q3 will wait until Q2 is finished before it can begin execution.

非统一请求Non-uniform requests

另一个可以借助重要性满足查询需求的场景是提交了具有不同资源类的请求。Another scenario where importance can help meet querying demands is when requests with different resource classes are submitted. 如前所述,在重要性相同的情况下,Azure Synapse 中的 Synapse SQL 池会针对吞吐量进行优化。As was previously mentioned, under the same importance, Synapse SQL pool in Azure Synapse optimizes for throughput. 如果混合大小请求(例如 smallrc 或 mediumrc)已排队,则 Synapse SQL 池会选择可用资源能满足的最早到达请求。When mixed size requests (such as smallrc or mediumrc) are queued, Synapse SQL pool will choose the earliest arriving request that fits within the available resources. 如果应用了工作负荷重要性,则计划执行的下一个请求是重要性最高的请求。If workload importance is applied, the highest importance request is scheduled next.

请考虑 DW500c 中的以下示例:Consider the following example on DW500c:

  • Q1、Q2、Q3 和 Q4 正在运行 smallrc 查询。Q1, Q2, Q3, and Q4 are running smallrc queries.
  • Q5 是在上午 9:00 提交的,具有 mediumrc 资源类。Q5 is submitted with the mediumrc resource class at 9am.
  • Q6 是在上午 9:01 提交的,具有 smallrc 资源类。Q6 is submitted with smallrc resource class at 9:01am.

由于 Q5 是 mediumrc,因此需要两个并发槽。Because Q5 is mediumrc, it requires two concurrency slots. Q5 需要等待两个正在运行的查询完成。Q5 needs to wait for two of the running queries to complete. 但是,当一个正在运行的查询 (Q1-Q4) 完成时,会紧接着计划 Q6,因为可以提供用于执行查询的资源。However, when one of the running queries (Q1-Q4) completes, Q6 is scheduled immediately because the resources exist to execute the query. 如果 Q5 的重要性高于 Q6,则 Q6 会等待 Q5 运行,然后才能开始执行。If Q5 has higher importance than Q6, Q6 waits until Q5 is running before it can begin executing.

后续步骤Next steps