子聚类化指标的均衡Balancing of subclustered metrics

什么是子聚类化What is subclustering

当具有不同放置约束的服务具有共同的指标并且都报告自身的负载时,就会发生子聚类化。Subclustering happens when services with different placement constraints have a common metric and they both report load for it. 如果各个服务报告的负载存在明显的差异,则节点上的总负载会出现较大的标准偏差,使群集看似不均衡,即使它保持了可能情况下的最佳均衡。If the load reported by the services differs significantly, the total load on the nodes will have a large standard deviation and it would look like the cluster is imbalanced, even when it has the best possible balance.

子聚类化对负载均衡造成何种影响How subclustering affects load balancing

如果不同节点上的服务报告的负载存在明显的差异,则看似可能存在很大的不均衡性,但实际上并没有。If the load reported by the services on different nodes differs significantly, it may look like there is a large imbalance where there is none. 此外,如果子聚类化导致的虚假不均衡性超过了实际的不均衡性,可能会给资源管理器均衡算法造成混淆,使群集中的均衡性欠佳。Also, if the false imbalance caused by subclustering is larger than the actual imbalance, it has the potential to confuse the Resource Manager balancing algorithm and to produce suboptimal balance in the cluster.

例如,假设有四个服务,它们全都报告负载的指标 Metric1:For example, let's say we have four services and they all report a load for metric Metric1:

  • 服务 A - 放置约束为“NodeType==Frontend”,报告的负载指标为 10Service A - has a placement constraint "NodeType==Frontend", reports a load of 10
  • 服务 B - 放置约束为“NodeType==Frontend”,报告的负载指标为 10Service B - has a placement constraint "NodeType==Frontend", reports a load of 10
  • 服务 C - 放置约束为“NodeType==Backend”,报告的负载指标为 100Service C - has a placement constraint "NodeType==Backend", reports a load of 100
  • 服务 D - 放置约束为“NodeType==Backend”,报告的负载指标为 100Service D - has a placement constraint "NodeType==Backend", reports a load of 100
  • 另外,我们有四个节点。And we have four nodes. 其中两个节点的 NodeType 设置为“Frontend”,另两个节点的 NodeType 设置为“Backend”Two of them have NodeType set as "Frontend" and the other two are "Backend"

放置约束如下:And we have the following placement:

子聚类化放置约束示例

群集可能看似不均衡,节点 3 和 4 上的负载较大,但在此情况下,此放置约束会产生可能最佳的均衡。The cluster may look unbalanced, we have a large load on nodes 3 and 4, but this placement creates the best possible balance in this situation.

资源管理器可以识别子聚类化情况,在几乎所有的情况下,它都能够为给定的场合产生最佳均衡。Resource Manager can recognize subclustering situations and in almost all cases it can produce the optimal balance for the given situation.

在某些特殊情况下,如果资源管理器无法为子聚类化指标提供最佳均衡,它仍会检测子聚类化,并生成运行状况报告,为你提供解决问题的建议。For some exceptional situations when Resource Manager is not able to optimally balance a subclustered metric it will still detect subclustering and it will generate a health report to advise you to fix the problem.

子聚类化的类型及其处理方式Types of subclustering and how they are handled

子聚类化情况可以分为三种不同的类别。Subclustering situations can be classified into three different categories. 特定子聚类化情况的类别决定了资源管理器对它的处理方式。The category of a specific subclustering situation determines how it will be handled by Resource Manager.

第一种类别 - 包含分离节点组的平缓子聚类化First category - flat subclustering with disjoint node groups

此类别是最简单的子聚类化形式,其中的节点可以划分为不同的组,并且每个服务只能放置到其中一个组中的节点上。This category has the simplest form of subclustering where nodes can be separated into different groups and each service can only be placed onto nodes in one of those groups. 每个节点仅属于一个组。Each node belongs to one group and one group only. 上述情况就是属于此类别,而大多数子聚类化情况也都属于此类别。The situation described above belongs in this category as do most of the subclustering situations.

对于此类别中的情况,资源管理器可以产生最佳均衡,且无需进一步的干预。For the situations in this category, the Resource Manager can produce the optimal balance and no further intervention is needed.

第二种类别 - 包含分层节点组的子聚类化Second category - subclustering with hierarchical node groups

如果一个服务允许的节点组是另一个服务允许的节点组的子集,则会发生这种情况。This situation happens when a group of nodes allowed for one service is a subset of the group of nodes allowed for another service. 这种情况最常见的示例是,某个服务有一个定义的放置约束,但另一个服务没有放置约束,因此后者可以放置在任何节点上。The most common example of this situation is when some service has a placement constraint defined and another service has no placement constraint and can be placed on any node.

示例:Example:

  • 服务 A:无放置约束Service A: no placement constraint
  • 服务 B:放置约束“NodeType==Frontend”Service B: placement constraint "NodeType==Frontend"
  • 服务 C:放置约束“NodeType==Backend”Service C: placement constraint "NodeType==Backend"

此配置将在不同服务的节点组之间创建子集-超集关系。This configuration creates a subset-superset relation between node groups for different services.

子集-超集子聚类

在此情况下,有可能会产生欠佳的均衡。In this situation, there is a chance that a suboptimal balance gets made.

资源管理器会识别这种情况,并生成运行状况报告,建议将服务 A 拆分为两个服务 - 可放置在 Frontend 节点上的服务 A1,以及可放置在 Backend 节点上的服务 A2。Resource Manager will recognize this situation and produce a health report advising you to split Service A into two services - Service A1 that can be placed on Frontend nodes and Service A2 that can be placed on Backend nodes. 这样,我们就会重新遇到能够实现最佳均衡的第一种类别的情况。This will bring us back to first category situation that can be balanced optimally.

第三种类别 - 节点集之间部分重叠的子聚类化Third category - subclustering with partial overlap between node sets

如果可以放置某些服务的节点集之间存在部分重叠,则会发生这种情况。This situation happens when there is a partial overlap between sets of nodes onto which some services can be placed.

例如,如果存在名为 NodeColor 的节点属性,并且存在三个节点:For example, if we have a node property called NodeColor and we have three nodes:

  • 节点 1:NodeColor=RedNode 1: NodeColor=Red
  • 节点 2:NodeColor=BlueNode 2: NodeColor=Blue
  • 节点 2:NodeColor=GreenNode 2: NodeColor=Green

此外有两个服务:And we have two services:

  • 服务 A:放置约束为“Color==Red || Color==Blue”Service A: with placement constraint "Color==Red || Color==Blue"
  • 服务 B:放置约束为“Color==Blue || Color==Green”Service B: with placement constraint "Color==Blue || Color==Green"

因此,服务 A 可以放置在节点 1 和 2 上,服务 B 可以放置在节点 2 和 3 上。Because of this, Service A can be placed on nodes 1 and 2 and Service B can be placed on nodes 2 and 3.

在此情况下,有可能会产生欠佳的均衡。In this situation, there is a chance that a suboptimal balance gets made.

资源管理器会识别这种情况,并生成运行状况报告,建议拆分某些服务。Resource Manager will recognize this situation and produce a health report advising you to split some of the services.

对于这种情况,资源管理器无法建议如何拆分服务,因为可以执行多次拆分,并且无法评估哪种服务拆分方式是最佳的。For this situation, the Resource Manager is not able to give a proposal how to split the services, since multiple splits can be done and there is no way to estimate which way would be the optimal one to split the services.

配置子聚类化Configuring subclustering

可以通过修改以下配置参数,来修改资源管理器在子聚类化方面的行为:The behavior of Resource Manager about subclustering can be modified by modifying the following configuration parameters:

  • SubclusteringEnabled - 此参数决定了资源管理器在执行负载均衡时是否会考虑子聚类化。SubclusteringEnabled - parameter determines whether Resource Manager will take subclustering into account when doing load balancing. 如果禁用此参数,则资源管理器会忽略子聚类化,并尝试在全局级别实现最佳均衡。If this parameter is turned off, Resource Manager will ignore subclustering and try to achieve optimal balance on a global level. 此参数的默认值为 false。The default value of this parameter is false.
  • SubclusteringReportingPolicy - 确定资源管理器如何发出有关分层和部分重叠子聚类化的运行状况报告。SubclusteringReportingPolicy - determines how Resource Manager will emit health reports for hierarchical and partial-overlap subclustering. 值为“0”表示禁用有关子聚类化的运行状况报告,值为“1”表示将为欠佳子聚类化情况生成警告运行状况报告,值为“2”表示将生成“正常”运行状况报告。A value of zero means that health reports about subclustering are turned off, "1" means that warning health reports will be produced for suboptimal subclustering situations and a value of "2" will produce "OK" health reports. 此参数的默认值为“1”。The default value for this parameter is "1".

ClusterManifest.xml:ClusterManifest.xml:

<Section Name="PlacementAndLoadBalancing">
    <Parameter Name="SubclusteringEnabled" Value="true" />
    <Parameter Name="SubclusteringReportingPolicy" Value="1" />
</Section>

通过用于独立部署的 ClusterConfig.json 或用于 Azure 托管群集的 Template.json:via ClusterConfig.json for Standalone deployments or Template.json for Azure hosted clusters:

"fabricSettings": [
  {
    "name": "PlacementAndLoadBalancing",
    "parameters": [
      {
          "name": "SubclusteringEnabled",
          "value": "true"
      },
      {
          "name": "SubclusteringReportingPolicy",
          "value": "1"
      },
    ]
  }
]

后续步骤Next steps

  • 若要了解群集 Resource Manager 如何管理和均衡群集中的负载,请查看有关均衡负载的文章To find out about how the Cluster Resource Manager manages and balances load in the cluster, check out the article on balancing load
  • 若要了解如何将服务约束为仅放置在特定的节点上,请参阅节点属性和放置约束To find out about how your services can be constrained to only be placed on certain nodes see Node properties and placement constraints