Azure Synapse Analytics – 工作负荷管理门户监视Azure Synapse Analytics – Workload Management Portal Monitoring

本文介绍了如何监视工作负荷组资源利用率和查询活动。This article explains how to monitor workload group resource utilization and query activity. 有关如何配置 Azure 指标资源管理器的详细信息,请参阅 Azure 指标资源管理器入门一文。For details on how to configure the Azure Metrics Explorer see the Getting started with Azure Metrics Explorer article. 有关如何监视系统资源使用情况的详细信息,请参阅 Azure Synapse Analytics 监视文档中的资源利用率部分。See the Resource utilization section in Azure Synapse Analytics Monitoring documentation for details on how to monitor system resource consumption. 有两种不同类别的工作负荷组指标用于监视工作负荷管理:资源分配和查询活动。There are two different categories of workload group metrics provided for monitoring workload management: resource allocation and query activity. 可以按工作负荷组拆分和筛选这些指标。These metrics can be split and filtered by workload group. 根据指标是系统定义的(资源类工作负荷组)还是用户定义的(由用户通过 CREATE WORKLOAD GROUP 语法创建),可以对其进行拆分和筛选。The metrics can be split and filtered based on if they are system defined (resource class workload groups) or user-defined (created by user with CREATE WORKLOAD GROUP syntax).

工作负荷管理指标定义Workload management metric definitions

标准名称Metric Name 说明Description 聚合类型Aggregation Type
有效的资源百分比上限Effective cap resource percent “有效的资源百分比上限”是对工作负荷组可访问的资源百分比的硬性限制,它考虑到了为其他工作负荷组分配的“有效的最小资源百分比”。Effective cap resource percent is a hard limit on the percentage of resources accessible by the workload group, taking into account Effective min resource percentage allocated for other workload groups. 使用 CREATE WORKLOAD GROUP 语法中的 CAP_PERCENTAGE_RESOURCE 参数配置“有效的资源百分比上限”指标。The Effective cap resource percent metric is configured using the CAP_PERCENTAGE_RESOURCE parameter in the CREATE WORKLOAD GROUP syntax. 此处描述了有效值。The effective value is described here.

例如,如果一个工作负荷组 DataLoads 是使用 CAP_PERCENTAGE_RESOURCE = 100 创建的,而另一个工作负荷组是使用 25% 的有效最小资源百分比创建的,则 DataLoads 工作负荷组的“有效的资源百分比上限”为 75%。For example if a workload group DataLoads is created with CAP_PERCENTAGE_RESOURCE = 100 and another workload group is created with an Effective min resource percentage of 25%, the Effective cap resource percent for the DataLoads workload group is 75%.

“有效的资源百分比上限”确定了工作负荷组可以实现的并发上限(以及可能的吞吐量)。The Effective cap resource percent determines the upper bound of concurrency (and thus potential throughput) a workload group can achieve. 除了“有效的资源百分比上限”指标当前报告的数量外,如果还需要更多吞吐量,要么增加 CAP_PERCENTAGE_RESOURCE,减少其他工作负荷组的 MIN_PERCENTAGE_RESOURCE,要么扩展实例以添加更多资源。If additional throughput is needed beyond what is currently reported by the Effective cap resource percent metric, either increase the CAP_PERCENTAGE_RESOURCE, decrease the MIN_PERCENTAGE_RESOURCE of other workload groups or scale up the instance to add more resources. 降低 REQUEST_MIN_RESOURCE_GRANT_PERCENT 会提升并发性,但可能不会提高整体吞吐量。Decreasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT can increase concurrency, but may not increase overall throughput.
最小值、平均值、最大值Min, Avg, Max
有效的最小资源百分比Effective min resource percent “有效的最小资源百分比”是指针对最低服务级别为工作负荷组预留和隔离的资源的最小百分比。Effective min resource percent is the minimum percentage of resources reserved and isolated for the workload group taking into account the service level minimum. 使用 CREATE WORKLOAD GROUP 语法中的 MIN_PERCENTAGE_RESOURCE 参数配置“有效的最小资源百分比”指标。The Effective min resource percent metric is configured using the MIN_PERCENTAGE_RESOURCE parameter in the CREATE WORKLOAD GROUP syntax. 此处描述了有效值。The effective value is described here.

如果未筛选和拆分此指标,使用 Sum 聚合类型可监视系统上配置的总工作负荷隔离情况。Use the Sum aggregation type when this metric is unfiltered and unsplit to monitor the total workload isolation configured on the system.

“有效的最小资源百分比”确定了工作负荷组可以实现的保证并发下限(以及保证的吞吐量)。The Effective min resource percent determines the lower bound of guaranteed concurrency (and thus guaranteed throughput) a workload group can achieve. 除了“有效的最小资源百分比”指标当前报告的数量外,如果还需要更多保证资源,请增加为工作负荷组配置的 MIN_PERCENTAGE_RESOURCE 参数。If additional guaranteed resources are needed beyond what is currently reported by the Effective min resource percent metric, increase the MIN_PERCENTAGE_RESOURCE parameter configured for the workload group. 降低 REQUEST_MIN_RESOURCE_GRANT_PERCENT 会提升并发性,但可能不会提高整体吞吐量。Decreasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT can increase concurrency, but may not increase overall throughput.
最小值、平均值、最大值Min, Avg, Max
工作负荷组活动查询Workload group active queries 此指标用于报告工作负荷组内的活动查询。This metric reports the active queries within the workload group. 在取消筛选和拆分的情况下,此指标会显示系统上运行的所有活动查询。Using this metric unfiltered and unsplit displays all active queries running on the system. SUMSum
按最大资源百分比列出的工作负荷组分配Workload group allocation by max resource percent 此指标显示了每个工作负荷组相对于“有效的资源百分比上限”的资源分配百分比。This metric displays the percentage allocation of resources relative to the Effective cap resource percent per workload group. 该指标提供了工作负荷组的有效利用率。This metric provides the effective utilization of the workload group.

假设工作负荷组 DataLoads的“有效的资源百分比上限”为 75%,REQUEST_MIN_RESOURCE_GRANT_PERCENT 配置为 25%。Consider a workload group DataLoads with an Effective cap resource percent of 75% and a REQUEST_MIN_RESOURCE_GRANT_PERCENT configured at 25%. 如果在此工作负荷组中运行单个查询,则为 DataLoads“按最大资源百分比列出的工作负荷组分配”筛选的值将为 33% (25%/75%)。The Workload group allocation by max resource percent value filtered to DataLoads would be 33% (25% / 75%) if a single query were running in this workload group.

使用此指标可标识工作负荷组的利用率。Use this metric to identify a workload group's utilization. 值接近 100% 表示正在使用的工作负荷组拥有所有可用资源。A value close to 100% indicates all resources available to the workload group are being used. 此外,对于同一工作负荷组,如果“工作负荷组排队查询”指标显示的值大于零,则表示此工作负荷组将利用其他资源(如果已分配)。Additionally, the Workload group queued queries metric for the same workload group showing a value greater than zero would indicate the workload group would utilize additional resources if allocated. 相反,如果此指标持续较低,“工作负荷组活动查询”也较低,则说明当前未使用该工作负荷组。Conversely, if this metric is consistently low and the Workload group active queries is low the workload group is not being utilized. 如果“有效的资源百分比上限”大于零,则这种情况尤其有问题,因为这表示工作负荷隔离利用不足This situation is especially problematic if Effective cap resource percent is greater than zero as that would indicate underutilized workload isolation.
最小值、平均值、最大值Min, Avg, Max
按系统百分比列出的工作负荷组分配Workload group allocation by system percent 此指标显示相对于整个系统的资源分配百分比。This metric displays the percentage allocation of resources relative to the entire system.

假设工作负荷组 DataLoads 配置有 25% 的 REQUEST_MIN_RESOURCE_GRANT_PERCENTConsider a workload group DataLoads with a REQUEST_MIN_RESOURCE_GRANT_PERCENT configured at 25%. 如果在此工作负荷组中运行单个查询,则“按系统百分比列出的工作负荷组分配”为 DataLoads 筛选的值将为 25% (25%/100%)。Workload group allocation by system percent value filtered to DataLoads would be 25% (25% / 100%) if a single query were running in this workload group.
最小值、平均值、最大值Min, Avg, Max
工作负荷组查询超时Workload group query timeouts 超时的工作负荷组查询。仅在开始执行查询后,此指标才报告查询超时(不包括由于锁定或资源等待而导致的等待时间)。Queries for the workload group that have timed out. Query timeouts reported by this metric are only once the query has started executing (it does not include wait time due to locking or resource waits).

使用 CREATE WORKLOAD GROUP 语法中的 QUERY_EXECUTION_TIMEOUT_SEC 参数配置查询超时。Query timeout is configured using the QUERY_EXECUTION_TIMEOUT_SEC parameter in the CREATE WORKLOAD GROUP syntax. 增加此值可以减少查询超时次数。Increasing the value could reduce the number of query timeouts.

考虑增加工作负荷组的 REQUEST_MIN_RESOURCE_GRANT_PERCENT 参数,以减少超时次数并为每个查询分配更多资源。Consider increasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT parameter for the workload group to reduce the amount of timeouts and allocate more resources per query. 请注意,增加 REQUEST_MIN_RESOURCE_GRANT_PERCENT 会减少工作负荷组的并发量。Note, increasing REQUEST_MIN_RESOURCE_GRANT_PERCENT reduces the amount of concurrency for the workload group.
SUMSum
工作负荷组排队查询Workload group queued queries 当前排队等待开始执行的工作负荷组查询。Queries for the workload group that are currently queued waiting to start execution. 当查询等待资源或锁时,可为其排队。Queries can be queue because they are waiting for resources or locks.

查询可能会出于多种原因而处于等待状态。Queries could be waiting for numerous reasons. 如果系统过载且并发需求超过了可用的资源,则查询将会排队。If the system is overloaded and the concurrency demand is greater than what is available, queries will queue.

请考虑增加 CREATE WORKLOAD GROUP 语句中的 CAP_PERCENTAGE_RESOURCE 参数,将更多资源添加到工作负荷组。Consider adding more resources to the workload group by increasing the CAP_PERCENTAGE_RESOURCE parameter in the CREATE WORKLOAD GROUP statement. 如果 CAP_PERCENTAGE_RESOURCE 大于“有效的资源百分比上限”指标,则为其他工作负荷组配置的工作负荷隔离会影响分配到此工作负荷组的资源。If CAP_PERCENTAGE_RESOURCE is greater than the Effective cap resource percent metric, the configured workload isolation for other workload group is impacting the resources allocated to this workload group. 请考虑降低其他工作负荷组的 MIN_PERCENTAGE_RESOURCE 或纵向扩展实例以添加更多资源。Consider lowering MIN_PERCENTAGE_RESOURCE of other workload groups or scale up the instance to add more resources.
SUMSum

监视方案和操作Monitoring scenarios and actions

下面是一系列图表配置,其中重点介绍了工作负荷管理指标在故障排除方面的用法,以及解决问题的相关操作。Below are a series of chart configurations to highlight workload management metric usage for troubleshooting along with associated actions to address the issue.

工作负荷隔离利用不足Underutilized workload isolation

假设在以下工作负荷组和分类器配置中创建了名为 wgPriority 的工作负荷组,TheCEO membername 使用 wcCEOPriority 工作负荷分类器映射到此工作负荷组。Consider the following workload group and classifier configuration where a workload group named wgPriority is created and TheCEO membername is mapped to it using the wcCEOPriority workload classifier. wgPriority 工作负荷组为其配置了 25% 的工作负荷隔离 (MIN_PERCENTAGE_RESOURCE = 25)。The wgPriority workload group has 25% workload isolation configured for it (MIN_PERCENTAGE_RESOURCE = 25). TheCEO 提交的每个查询获得了 5% 的系统资源 (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5)。Each query submitted by TheCEO is given 5% of system resources (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5).

CREATE WORKLOAD GROUP wgPriority
WITH ( MIN_PERCENTAGE_RESOURCE = 25
      ,CAP_PERCENTAGE_RESOURCE = 50
      ,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 5);

CREATE WORKLOAD CLASSIFIER wcCEOPriority
WITH ( WORKLOAD_GROUP = 'wgPriority'
      ,MEMBERNAME = 'TheCEO');

下方的图表使用以下指标配置:The below chart is configured as follows:
指标 1:有效的最小资源百分比(平均值聚合,blue lineMetric 1: Effective min resource percent (Avg aggregation, blue line)
指标 2:按系统百分比列出的工作负荷组分配(平均值聚合,purple lineMetric 2: Workload group allocation by system percent (Avg aggregation, purple line)
筛选:[工作负荷组] = wgPriorityFilter: [Workload Group] = wgPriority
underutilized-wg.png 该图表显示,工作负荷隔离配置为 25% 时,平均使用率仅为 10%。underutilized-wg.png The chart shows that with 25% workload isolation, only 10% is being used on average. 在这种情况下,可将 MIN_PERCENTAGE_RESOURCE 参数值降至 10 到 15 之间,并允许系统上的其他工作负荷使用资源。In this case, the MIN_PERCENTAGE_RESOURCE parameter value could be lowered to between 10 or 15 and allow for other workloads on the system to consume the resources.

工作负荷组瓶颈Workload group bottleneck

假设在以下工作负荷组和分类器配置中创建了名为 wgDataAnalyst 的工作负荷组,DataAnalyst membername 使用 wcDataAnalyst 工作负荷分类器映射到此工作负荷组。Consider the following workload group and classifier configuration where a workload group named wgDataAnalyst is created and the DataAnalyst membername is mapped to it using the wcDataAnalyst workload classifier. wgDataAnalyst 工作负荷组为其配置了 6% 的工作负荷组隔离(MIN_PERCENTAGE_RESOURCE = 6),资源限制为 9% (CAP_PERCENTAGE_RESOURCE = 9)。The wgDataAnalyst workload group has 6% workload isolation configured for it (MIN_PERCENTAGE_RESOURCE = 6) and a resource limit of 9% (CAP_PERCENTAGE_RESOURCE = 9). DataAnalyst 提交的每个查询获得了 3% 的系统资源 (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 3)。Each query submitted by the DataAnalyst is given 3% of system resources (REQUEST_MIN_RESOURCE_GRANT_PERCENT = 3).

CREATE WORKLOAD GROUP wgDataAnalyst  
WITH ( MIN_PERCENTAGE_RESOURCE = 6
      ,CAP_PERCENTAGE_RESOURCE = 9
      ,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 3);

CREATE WORKLOAD CLASSIFIER wcDataAnalyst
WITH ( WORKLOAD_GROUP = 'wgDataAnalyst'
      ,MEMBERNAME = 'DataAnalyst');

下方的图表使用以下指标配置:The below chart is configured as follows:
指标 1:有效的资源百分比上限(平均值聚合,blue lineMetric 1: Effective cap resource percent (Avg aggregation, blue line)
指标 2:按最大资源百分比列出的工作负荷组分配(平均值聚合,purple lineMetric 2: Workload group allocation by max resource percent (Avg aggregation, purple line)
指标 3:工作负荷组排队查询(总和聚合,turquoise lineMetric 3: Workload group queued queries (Sum aggregation, turquoise line)
筛选:[工作负荷组] = wgDataAnalystFilter: [Workload Group] = wgDataAnalyst
bottle-necked-wg 该图表显示,资源上限为 9% 时,工作负荷组的利用率为 90% 以上(从“按最大资源百分比列出的工作负荷组分配”指标可以看出)。bottle-necked-wg The chart shows that with a 9% cap on resources, the workload group is 90%+ utilized (from the Workload group allocation by max resource percent metric). 如“工作负荷组排队查询”指标所示,查询正在稳定排队。There is a steady queuing of queries as shown from the Workload group queued queries metric. 在这种情况下,将 CAP_PERCENTAGE_RESOURCE 值增加至 9% 以上将允许更多查询并行执行。In this case, increasing the CAP_PERCENTAGE_RESOURCE to a value higher than 9% will allow more queries to execute concurrently. 增加 CAP_PERCENTAGE_RESOURCE 的前提条件是有足够的可用资源,并且其他工作负荷组未隔离资源。Increasing the CAP_PERCENTAGE_RESOURCE assumes that there are enough resources available and not isolated by other workload groups. 通过检查“有效的资源百分比上限”指标来确认上限是否增加。Verify the cap increased by checking the Effective cap resource percent metric. 如果需要更高的吞吐量,另请考虑将 REQUEST_MIN_RESOURCE_GRANT_PERCENT 值增加至 3 以上。If more throughput is desired, also consider increasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT to a value greater than 3. 增加 REQUEST_MIN_RESOURCE_GRANT_PERCENT 可以提高查询的运行速度。Increasing the REQUEST_MIN_RESOURCE_GRANT_PERCENT could allow queries to run faster.

后续步骤Next steps