Azure Synapse Analytics – 工作负荷管理门户监视(预览版)Azure Synapse Analytics – Workload Management Portal Monitoring (Preview)

本文介绍如何监视工作负荷组资源利用率和查询活动。This article explains how to monitor workload group resource utilization and query activity. 有关如何配置 Azure 指标资源管理器的详细信息,请参阅 Azure 指标资源管理器入门一文。For details on how to configure the Azure Metrics Explorer see the Getting started with Azure Metrics Explorer article. 有关如何监视系统资源消耗量的详细信息,请参阅 Azure SQL 数据仓库监视文档中的资源利用率部分。See the Resource utilization section in Azure SQL Data Warehouse Monitoring documentation for details on how to monitor system resource consumption. 提供了两种不同类别的工作负荷组指标用于监视工作负荷管理:资源分配和查询活动。There are two different categories of workload group metrics provided for monitoring workload management: resource allocation and query activity. 可按工作负荷组拆分和筛选这些指标。These metrics can be split and filtered by workload group. 可以根据指标是系统定义的(资源类工作负荷组)还是用户定义的(由用户使用 CREATE WORKLOAD GROUP 语法创建),来拆分和筛选指标。The metrics can be split and filtered based on if they are system defined (resource class workload groups) or user-defined (created by user with CREATE WORKLOAD GROUP syntax).

工作负荷管理指标定义Workload management metric definitions

指标名称Metric Name 说明Description 聚合类型Aggregation Type
有效上限资源百分比Effective cap resource percent “有效上限资源百分比”是对工作负荷组可访问的资源百分比施加的硬性限制,它考虑到了为其他工作负荷组分配的“有效最小资源百分比”。 Effective cap resource percent is a hard limit on the percentage of resources accessible by the workload group, taking into account Effective min resource percentage allocated for other workload groups. “有效上限资源百分比”指标是使用 CREATE WORKLOAD GROUP 语法中的 CAP_PERCENTAGE_RESOURCE 参数配置的。 The Effective cap resource percent metric is configured using the CAP_PERCENTAGE_RESOURCE parameter in the CREATE WORKLOAD GROUP syntax. 此处描述了有效值。The effective value is described here.

例如,如果工作负荷组 DataLoads 是使用 CAP_PERCENTAGE_RESOURCE = 100 创建的,另一个工作负荷组是使用“有效最小资源百分比”25% 创建的,则 DataLoads 工作负荷组的“有效上限资源百分比”为 75%。 For example if a workload group DataLoads is created with CAP_PERCENTAGE_RESOURCE = 100 and another workload group is created with an Effective min resource percentage of 25%, the Effective cap resource percent for the DataLoads workload group is 75%.

“有效上限资源百分比”确定了工作负荷组可以实现的并发性(因而也包括可实现的潜在吞吐量)的上限。 The Effective cap resource percent determines the upper bound of concurrency (and thus potential throughput) a workload group can achieve. 如果除了“有效上限资源百分比”指标当前报告的吞吐量以外,还需要分配更多的吞吐量,请增大 CAP_PERCENTAGE_RESOURCE、减小其他工作负荷组的 MIN_PERCENTAGE_RESOURCE,或者纵向扩展实例以添加更多资源。 If additional throughput is needed beyond what is currently reported by the Effective cap resource percent metric, either increase the CAP_PERCENTAGE_RESOURCE, decrease the MIN_PERCENTAGE_RESOURCE of other workload groups or scale up the instance to add more resources. 减小 REQUEST_MIN_RESOURCE_GRANT_PERCENT 会提高并发性,但可能不会提高整体吞吐量。Decreasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT can increase concurrency, but may not increase overall throughput.
最小值、平均值、最大值Min, Avg, Max
有效最小资源百分比Effective min resource percent “有效最小资源百分比”是为工作负荷组保留并隔离的资源的最小百分比,它考虑到了最低的服务级别。 Effective min resource percent is the minimum percentage of resources reserved and isolated for the workload group taking into account the service level minimum. “有效最小资源百分比”指标是使用 CREATE WORKLOAD GROUP 语法中的 MIN_PERCENTAGE_RESOURCE 参数配置的。The Effective min resource percent metric is configured using the MIN_PERCENTAGE_RESOURCE parameter in the CREATE WORKLOAD GROUP syntax. 此处描述了有效值。The effective value is described here.

未筛选并拆分此指标时,请使用“总和”聚合类型来监视系统上配置的总体工作负荷隔离。Use the Sum aggregation type when this metric is unfiltered and unsplit to monitor the total workload isolation configured on the system.

“有效最小资源百分比”确定工作负荷组可以实现的有保证的并发性(因而也包括有保证的吞吐量)下限。 The Effective min resource percent determines the lower bound of guaranteed concurrency (and thus guaranteed throughput) a workload group can achieve. 如果除了“有效最小资源百分比”指标当前报告的资源以外,还需要分配更多的有保证资源,请增大为工作负荷组配置的 MIN_PERCENTAGE_RESOURCE 参数。 If additional guaranteed resources are needed beyond what is currently reported by the Effective min resource percent metric, increase the MIN_PERCENTAGE_RESOURCE parameter configured for the workload group. 减小 REQUEST_MIN_RESOURCE_GRANT_PERCENT 会提高并发性,但可能不会提高整体吞吐量。Decreasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT can increase concurrency, but may not increase overall throughput.
最小值、平均值、最大值Min, Avg, Max
工作负荷组活动查询Workload group active queries 此指标报告工作负荷组中的活动查询。This metric reports the active queries within the workload group. 在不筛选且不拆分的情况下使用此指标会显示系统上运行的所有活动查询。Using this metric unfiltered and unsplit displays all active queries running on the system. SumSum
按最大资源百分比列出的工作负荷组分配Workload group allocation by max resource percent 此指标显示相对于每个工作负荷组的“有效上限资源百分比”的资源分配百分比。 This metric displays the percentage allocation of resources relative to the Effective cap resource percent per workload group. 此指标提供工作负荷组的有效利用率。This metric provides the effective utilization of the workload group.

假设工作负荷组 DataLoads 的“有效上限资源百分比”为 75%,REQUEST_MIN_RESOURCE_GRANT_PERCENT 配置为 25%。 Consider a workload group DataLoads with an Effective cap resource percent of 75% and a REQUEST_MIN_RESOURCE_GRANT_PERCENT configured at 25%. 如果此工作负荷组中只运行了一个查询,则筛选为 DataLoads 的“按最大资源百分比列出的工作负荷组分配”值将是 33% (25% / 75%)。 The Workload group allocation by max resource percent value filtered to DataLoads would be 33% (25% / 75%) if a single query were running in this workload group.

使用此指标可以识别工作负荷组的利用率。Use this metric to identify a workload group’s utilization. 接近 100% 的值表示正在使用工作负荷组的所有可用资源。A value close to 100% indicates all resources available to the workload group are being used. 此外,如果同一工作负荷组的“工作负荷组排队查询指标”显示大于零的值,则表示该工作负荷组将利用额外的资源(如果已分配)。 Additionally, the Workload group queued queries metric for the same workload group showing a value greater than zero would indicate the workload group would utilize additional resources if allocated. 相反,如果此指标持续较低,并且“工作负荷组活动查询”较低,则表示未利用该工作负荷组。 Conversely, if this metric is consistently low and the Workload group active queries is low the workload group is not being utilized. 尤其是当“有效上限资源百分比”大于零时,这种情况会造成问题,因为这表示发生了利用不足的工作负荷隔离This situation is especially problematic if Effective cap resource percent is greater than zero as that would indicate underutilized workload isolation.
最小值、平均值、最大值Min, Avg, Max
按系统百分比列出的工作负荷组分配Workload group allocation by system percent 此指标显示相对于整个系统的资源分配百分比。This metric displays the percentage allocation of resources relative to the entire system.

假设工作负荷组 DataLoadsREQUEST_MIN_RESOURCE_GRANT_PERCENT 配置为 25%。Consider a workload group DataLoads with a REQUEST_MIN_RESOURCE_GRANT_PERCENT configured at 25%. 如果此工作负荷组中只运行了一个查询,则筛选为 DataLoads 的“按系统百分比列出的工作负荷组分配”值将是 25% (25% / 100%)。 Workload group allocation by system percent value filtered to DataLoads would be 25% (25% / 100%) if a single query were running in this workload group.
最小值、平均值、最大值Min, Avg, Max
工作负荷组查询超时Workload group query timeouts 已超时的工作负荷组查询。只有在查询开始执行查询后,此指标才报告查询超时(不包括由于锁定或资源等待而导致的等待时间)。Queries for the workload group that have timed out. Query timeouts reported by this metric are only once the query has started executing (it does not include wait time due to locking or resource waits).

查询超时是使用 CREATE WORKLOAD GROUP 语法中的 QUERY_EXECUTION_TIMEOUT_SEC 参数配置的。Query timeout is configured using the QUERY_EXECUTION_TIMEOUT_SEC parameter in the CREATE WORKLOAD GROUP syntax. 增大该值可以减少查询超时次数。Increasing the value could reduce the number of query timeouts.

请考虑增大工作负荷组的 REQUEST_MIN_RESOURCE_GRANT_PERCENT 参数,以减少超时次数并为每个查询分配更多的资源。Consider increasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT parameter for the workload group to reduce the amount of timeouts and allocate more resources per query. 请注意,增大 REQUEST_MIN_RESOURCE_GRANT_PERCENT 会减少工作负荷组的并发量。Note, increasing REQUEST_MIN_RESOURCE_GRANT_PERCENT reduces the amount of concurrency for the workload group.
SumSum
工作负荷组排队查询Workload group queued queries 当前已排队等待开始执行的工作负荷组查询。Queries for the workload group that are currently queued waiting to start execution. 当查询等待资源或锁时,可将其排队。Queries can be queue because they are waiting for resources or locks.

查询可能会出于许多原因而处于等待状态。Queries could be waiting for numerous reasons. 如果系统过载并且并发需求超过了可用的资源,则查询将会排队。If the system is overloaded and the concurrency demand is greater than what is available, queries will queue.

请考虑增大 CREATE WORKLOAD GROUP 语句中的 CAP_PERCENTAGE_RESOURCE 参数,将更多资源添加到工作负荷组。Consider adding more resources to the workload group by increasing the CAP_PERCENTAGE_RESOURCE parameter in the CREATE WORKLOAD GROUP statement. 如果 CAP_PERCENTAGE_RESOURCE 大于“有效上限资源百分比”指标,则为其他工作负荷组配置的工作负荷隔离会影响分配到此工作负荷组的资源。 If CAP_PERCENTAGE_RESOURCE is greater than the Effective cap resource percent metric, the configured workload isolation for other workload group is impacting the resources allocated to this workload group. 请考虑降低其他工作负荷组的 MIN_PERCENTAGE_RESOURCE,或纵向扩展实例以添加更多资源。Consider lowering MIN_PERCENTAGE_RESOURCE of other workload groups or scale up the instance to add more resources.
SumSum

监视方案和操作Monitoring scenarios and actions

下面是一系列图表配置,其中重点介绍了用于故障排除的工作负荷管理指标的用法,以及解决问题的相关操作。Below are a series of chart configurations to highlight workload management metric usage for troubleshooting along with associated actions to address the issue.

利用不足的工作负荷隔离Underutilized workload isolation

假设在以下工作负荷组和分类器配置中,创建了名为 wgPriority 的工作负荷组,TheCEO membername 使用 wcCEOPriority 工作负荷分类器映射到该工作负荷组。Consider the following workload group and classifier configuration where a workload group named wgPriority is created and TheCEO membername is mapped to it using the wcCEOPriority workload classifier. wgPriority 工作负荷组配置了 25% 的工作负荷隔离 (MIN_PERCENTAGE_RESOURCE = 25)。The wgPriority workload group has 25% workload isolation configured for it (MIN_PERCENTAGE_RESOURCE = 25). TheCEO 提交的每个查询分配了 5% 的系统资源 (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5)。Each query submitted by TheCEO is given 5% of system resources (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5).

CREATE WORKLOAD GROUP wgPriority 
WITH ( MIN_PERCENTAGE_RESOURCE = 25   
      ,CAP_PERCENTAGE_RESOURCE = 50 
      ,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5); 

CREATE WORKLOAD CLASSIFIER wcCEOPriority 
WITH ( WORKLOAD_GROUP = 'wgPriority'
      ,MEMBERNAME = 'TheCEO');

下面是图表的配置方式:The below chart is configured as follows:
指标 1:有效最小资源百分比(“平均值”聚合,blue lineMetric 1: Effective min resource percent (Avg aggregation, blue line)
指标 2:按系统百分比列出的工作负荷组分配(“平均值”聚合,purple lineMetric 2: Workload group allocation by system percent (Avg aggregation, purple line)
筛选器:[工作负荷组] = wgPriorityFilter: [Workload Group] = wgPriority
underutilized-wg.png 该图表显示,使用 25% 工作负荷隔离时,平均只使用了 10%。underutilized-wg.png The chart shows that with 25% workload isolation, only 10% is being used on average. 在这种情况下,可将 MIN_PERCENTAGE_RESOURCE 参数值降低至 10 到 15 之间,并允许系统中的其他工作负荷消耗资源。In this case, the MIN_PERCENTAGE_RESOURCE parameter value could be lowered to between 10 or 15 and allow for other workloads on the system to consume the resources.

工作负荷组瓶颈Workload group bottleneck

假设在以下工作负荷组和分类器配置中,创建了名为 wgDataAnalyst 的工作负荷组,DataAnalyst membername 使用 wcDataAnalyst 工作负荷分类器映射到该工作负荷组。Consider the following workload group and classifier configuration where a workload group named wgDataAnalyst is created and the DataAnalyst membername is mapped to it using the wcDataAnalyst workload classifier. wgDataAnalyst 工作负荷组配置了 6% 的工作负荷隔离 (MIN_PERCENTAGE_RESOURCE = 6),资源限制为 9% (CAP_PERCENTAGE_RESOURCE = 9)。The wgDataAnalyst workload group has 6% workload isolation configured for it (MIN_PERCENTAGE_RESOURCE = 6) and a resource limit of 9% (CAP_PERCENTAGE_RESOURCE = 9). DataAnalyst 提交的每个查询分配了 3% 的系统资源 (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 3)。Each query submitted by the DataAnalyst is given 3% of system resources (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 3).

CREATE WORKLOAD GROUP wgDataAnalyst  
WITH ( MIN_PERCENTAGE_RESOURCE = 6   
      ,CAP_PERCENTAGE_RESOURCE = 9 
      ,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 3); 

CREATE WORKLOAD CLASSIFIER wcDataAnalyst 
WITH ( WORKLOAD_GROUP = 'wgDataAnalyst'
      ,MEMBERNAME = 'DataAnalyst');

下面是图表的配置方式:The below chart is configured as follows:
指标 1:有效上限资源百分比(“平均值”聚合,blue lineMetric 1: Effective cap resource percent (Avg aggregation, blue line)
指标 2:按最大资源百分比列出的工作负荷组分配(“平均值”聚合,purple lineMetric 2: Workload group allocation by max resource percent (Avg aggregation, purple line)
指标 3:工作负荷组排队查询(“总和”聚合,turquoise lineMetric 3: Workload group queued queries (Sum aggregation, turquoise line)
筛选器:[工作负荷组] = wgDataAnalystFilter: [Workload Group] = wgDataAnalyst
bottle-necked-wg 该图表显示,使用 9% 的资源上限时,工作负荷组的利用率为 90% 以上(从“按最大资源百分比列出的工作负荷组分配”指标可以看出)。 bottle-necked-wg The chart shows that with a 9% cap on resources, the workload group is 90%+ utilized (from the Workload group allocation by max resource percent metric). 从“工作负荷组排队查询”指标可以看到,查询正在稳定排队。 There is a steady queuing of queries as shown from the Workload group queued queries metric. 在这种情况下,将 CAP_PERCENTAGE_RESOURCE 值增大至 9% 以上可让更多查询并行执行。In this case, increasing the CAP_PERCENTAGE_RESOURCE to a value higher than 9% will allow more queries to execute concurrently. 增大 CAP_PERCENTAGE_RESOURCE 的假设条件是有足够的可用资源,并且其他工作负荷组未隔离资源。Increasing the CAP_PERCENTAGE_RESOURCE assumes that there are enough resources available and not isolated by other workload groups. 检查“有效上限资源百分比”指标来确认上限是否增大。 Verify the cap increased by checking the Effective cap resource percent metric. 如果需要更高的吞吐量,另请考虑将 REQUEST_MIN_RESOURCE_GRANT_PERCENT 值增大至 3 以上。If more throughput is desired, also consider increasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT to a value greater than 3. 增大 REQUEST_MIN_RESOURCE_GRANT_PERCENT 可以提高查询的运行速度。Increasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT could allow queries to run faster.

后续步骤Next steps

快速入门:使用 T-SQL 配置工作负荷隔离Quickstart: Configure workload isolation using T-SQL
CREATE WORKLOAD GROUP (Transact-SQL)CREATE WORKLOAD GROUP (Transact-SQL)
CREATE WORKLOAD CLASSIFIER (Transact-SQL)CREATE WORKLOAD CLASSIFIER (Transact-SQL)
监视资源利用率Monitoring resource utilization