均衡 Service Fabric 群集Balancing your service fabric cluster

Service Fabric 群集资源管理器支持动态负载更改、对添加或删除节点或服务做出反应。The Service Fabric Cluster Resource Manager supports dynamic load changes, reacting to additions or removals of nodes or services. 还会自动更正约束冲突和主动重新均衡群集。It also automatically corrects constraint violations, and proactively rebalances the cluster. 但这些操作的执行频率是多少,又是什么触发了这些操作?But how often are these actions taken, and what triggers them?

群集资源管理器可以执行三种不同类别的工作。There are three different categories of work that the Cluster Resource Manager performs. 它们具有以下特点:They are:

  1. 放置 - 此阶段负责放置任何遗漏的有状态副本或无状态实例。Placement - this stage deals with placing any stateful replicas or stateless instances that are missing. 放置既包括新服务也包括处理已失败的有状态副本或无状态实例。Placement includes both new services and handling stateful replicas or stateless instances that have failed. 可在此进行删除和丢弃副本或实例。Deleting and dropping replicas or instances are handled here.
  2. 约束检查 - 此阶段检查并更正系统中不同放置约束(规则)的违规情况。Constraint Checks - this stage checks for and corrects violations of the different placement constraints (rules) within the system. 规则示例包括确保节点不超出容量,以及符合服务的放置约束。Examples of rules are things like ensuring that nodes are not over capacity and that a service's placement constraints are met.
  3. 均衡 - 此阶段根据为不同指标配置的所需均衡级别检查是否需要重新均衡。Balancing - this stage checks to see if rebalancing is necessary based on the configured desired level of balance for different metrics. 如果需要,则尝试查找群集中更均衡的排列方式。If so it attempts to find an arrangement in the cluster that is more balanced.

配置群集资源管理器计时器Configuring Cluster Resource Manager Timers

与均衡相关的第一组控件是一组计时器。The first set of controls around balancing are a set of timers. 这些计时器控制群集资源管理器检查群集并采取纠正措施的频率。These timers govern how often the Cluster Resource Manager examines the cluster and takes corrective actions.

群集 Resource Manager 可以进行的每种不同类型的修复都由不同的计时器控制,控管修复频率。Each of these different types of corrections the Cluster Resource Manager can make is controlled by a different timer that governs its frequency. 激发每个计时器时,会计划任务。When each timer fires, the task is scheduled. 默认情况下,Resource Manager:By default the Resource Manager:

  • 每 1/10 秒扫描其状态并应用更新(如记录某节点处于关闭状态)scans its state and applies updates (like recording that a node is down) every 1/10th of a second
  • 每秒设置放置检查标志sets the placement check flag every second
  • 每秒设置约束检查标志sets the constraint check flag every second
  • 每隔 5 秒设置一次均衡标志sets the balancing flag every five seconds

下面是用于管理这些计时器的配置示例:Examples of the configuration governing these timers are below:

ClusterManifest.xml:ClusterManifest.xml:

<Section Name="PlacementAndLoadBalancing">
    <Parameter Name="PLBRefreshGap" Value="0.1" />
    <Parameter Name="MinPlacementInterval" Value="1.0" />
    <Parameter Name="MinConstraintCheckInterval" Value="1.0" />
    <Parameter Name="MinLoadBalancingInterval" Value="5.0" />
</Section>

通过用于独立部署的 ClusterConfig.json 或用于 Azure 托管群集的 Template.json:via ClusterConfig.json for Standalone deployments or Template.json for Azure hosted clusters:

"fabricSettings": [
  {
    "name": "PlacementAndLoadBalancing",
    "parameters": [
      {
          "name": "PLBRefreshGap",
          "value": "0.10"
      },
      {
          "name": "MinPlacementInterval",
          "value": "1.0"
      },
      {
          "name": "MinConstraintCheckInterval",
          "value": "1.0"
      },
      {
          "name": "MinLoadBalancingInterval",
          "value": "5.0"
      }
    ]
  }
]

现在,群集资源管理器仅按顺序一次执行这些操作中的一个。Today the Cluster Resource Manager only performs one of these actions at a time, sequentially. 因此将这些计时器称为“最小间隔”,将计时器停止时可采取的操作称为“设置标志”。This is why we refer to these timers as "minimum intervals" and the actions that get taken when the timers go off as "setting flags". 例如,群集 Resource Manager 处理挂起的请求,以在均衡群集之前创建服务。For example, the Cluster Resource Manager takes care of pending requests to create services before balancing the cluster. 可以看到,群集资源管理器根据指定的默认时间间隔扫描它需要频繁执行的任何操作。As you can see by the default time intervals specified, the Cluster Resource Manager scans for anything it needs to do frequently. 通常这意味着在每个步骤中所采取的变更集很小。Normally this means that the set of changes made during each step is small. 通过频繁的小更改,群集资源管理器可以在群集中发生事件时迅速做出响应。Making small changes frequently allows the Cluster Resource Manager to be responsive when things happen in the cluster. 许多相同类型的事件往往同时发生,因此默认计时器可进行某种批处理。The default timers provide some batching since many of the same types of events tend to occur simultaneously.

例如,节点出现故障时,它们可以一次性地对整个容错域执行这样的操作。For example, when nodes fail they can do so entire fault domains at a time. 会在 PLBRefreshGap 后的下一个状态更新过程中捕获所有这些故障。All these failures are captured during the next state update after the PLBRefreshGap. 在以下放置、约束检查和均衡运行的过程中,确定要修正的内容。The corrections are determined during the following placement, constraint check, and balancing runs. 默认情况下,群集 Resource Manager 不扫描群集中数小时内进行的更改或尝试一次处理所有更改。By default the Cluster Resource Manager is not scanning through hours of changes in the cluster and trying to address all changes at once. 这样会导致大量改动。Doing so would lead to bursts of churn.

群集 Resource Manager 还需要一些其他信息来确定群集是否不均衡。The Cluster Resource Manager also needs some additional information to determine if the cluster imbalanced. 为此,我们还提供了另外两个配置:BalancingThresholdsActivityThresholdsFor that we have two other pieces of configuration: BalancingThresholds and ActivityThresholds.

均衡阈值Balancing thresholds

均衡阈值是触发重新均衡的主要控件。A Balancing Threshold is the main control for triggering rebalancing. 指标的均衡阈值是一个_比率_。The Balancing Threshold for a metric is a ratio . 如果负载最重的节点上某个指标的负载除以负载最轻的节点的负载量超过指标的 BalancingThreshold,群集是不均衡的。If the load for a metric on the most loaded node divided by the amount of load on the least loaded node exceeds that metric's BalancingThreshold, then the cluster is imbalanced. 因此群集 Resource Manager 进行下一次检查时将触发均衡。As a result balancing is triggered the next time the Cluster Resource Manager checks. MinLoadBalancingInterval 计时器定义群集资源管理器应检查是否需要重新均衡的频率。The MinLoadBalancingInterval timer defines how often the Cluster Resource Manager should check if rebalancing is necessary. 检查并不代表发生任何事件。Checking doesn't mean that anything happens.

均衡阈值根据每个指标定义为群集定义的一部分。Balancing Thresholds are defined on a per-metric basis as a part of the cluster definition. 有关指标的详细信息,请参阅此文For more information on metrics, check out this article.

ClusterManifest.xmlClusterManifest.xml

<Section Name="MetricBalancingThresholds">
  <Parameter Name="MetricName1" Value="2"/>
  <Parameter Name="MetricName2" Value="3.5"/>
</Section>

通过用于独立部署的 ClusterConfig.json 或用于 Azure 托管群集的 Template.json:via ClusterConfig.json for Standalone deployments or Template.json for Azure hosted clusters:

"fabricSettings": [
  {
    "name": "MetricBalancingThresholds",
    "parameters": [
      {
          "name": "MetricName1",
          "value": "2"
      },
      {
          "name": "MetricName2",
          "value": "3.5"
      }
    ]
  }
]

平衡阈值示例

在此示例中,每个服务使用一个单位的指标。In this example, each service is consuming one unit of some metric. 在最上面的示例中,节点的负载上限为 5,下限为 2。In the top example, the maximum load on a node is five and the minimum is two. 假设此指标的均衡阈值为 3。Let's say that the balancing threshold for this metric is three. 群集中的比率为 5/2 = 2.5,这小于指定的均衡阈值 3,因此群集被视为均衡。Since the ratio in the cluster is 5/2 = 2.5 and that is less than the specified balancing threshold of three, the cluster is balanced. 群集 Resource Manager 进行检查时不会触发均衡。No balancing is triggered when the Cluster Resource Manager checks.

在底部的示例中,节点的最大负载为 10,最小负载为 2,因此比率为 5。In the bottom example, the maximum load on a node is 10, while the minimum is two, resulting in a ratio of five. 5 大于该指标的指定均衡阈值 3。Five is greater than the designated balancing threshold of three for that metric. 因此,下一次引发均衡计时器时,将计划运行重新均衡。As a result, a rebalancing run will be scheduled next time the balancing timer fires. 在此类似情况下,一些负载通常会分配到 Node3。In a situation like this some load is usually distributed to Node3. 因为 Service Fabric 群集资源管理器不使用贪婪方法,所以一些负载也可能分配到 Node2。Because the Service Fabric Cluster Resource Manager doesn't use a greedy approach, some load could also be distributed to Node2.

平衡阈值示例操作

Note

“均衡”会处理两种不同的策略,管理群集中的负载。"Balancing" handles two different strategies for managing load in your cluster. 群集资源管理器使用的默认策略是在群集的节点间分发负载。The default strategy that the Cluster Resource Manager uses is to distribute load across the nodes in the cluster. 另一个策略是碎片整理The other strategy is defragmentation. 在同一均衡运行的过程中,执行碎片整理。Defragmentation is performed during the same balancing run. 均衡和碎片整理策略可以用于同一群集中的不同指标。The balancing and defragmentation strategies can be used for different metrics within the same cluster. 一个服务可具有均衡和碎片整理两个指标。A service can have both balancing and defragmentation metrics. 对于碎片整理指标,群集中负载的比率低于均衡阈值时,会触发重新均衡。For defragmentation metrics, the ratio of the loads in the cluster triggers rebalancing when it is below the balancing threshold.

低于均衡阈值不是直接目标。Getting below the balancing threshold is not an explicit goal. 均衡阈值只是一个触发器。Balancing Thresholds are just a trigger. 均衡运行时,群集资源管理器会确定它可进行哪些改进(如有)。When balancing runs, the Cluster Resource Manager determines what improvements it can make, if any. 因为仅仅启动均衡搜索并不意味着会移动任何内容。Just because a balancing search is kicked off doesn't mean anything moves. 有时群集是不均衡的,但约束过度,就无法修正。Sometimes the cluster is imbalanced but too constrained to correct. 或者,改进需要成本高昂的移动)。Alternatively, the improvements require movements that are too costly).

活动阈值Activity thresholds

有时,虽然节点相当不均衡,但群集中的负载 总量 很低。Sometimes, although nodes are relatively imbalanced, the total amount of load in the cluster is low. 负载缺乏可能是暂时性的下降,或是因为群集是新的并且刚刚开始引导。The lack of load could be a transient dip, or because the cluster is new and just getting bootstrapped. 不管是哪种情况,建议不要花费时间来均衡群集,因为实际的收获很少。In either case, you may not want to spend time balancing the cluster because there's little to be gained. 如果均衡群集,会耗费网络和计算资源进行移动操作,却不会产生任何大的绝对差异。If the cluster underwent balancing, you'd spend network and compute resources to move things around without making any large absolute difference. 为了避免不必要的移动,可使用名为“活动阈值”的另一种控件。To avoid unnecessary moves, there's another control known as Activity Thresholds. 活动阈值可以指定活动的绝对下限。Activity Thresholds allows you to specify some absolute lower bound for activity. 如果没有节点高于此阈值,即使达到均衡阈值,也不触发均衡。If no node is over this threshold, balancing isn't triggered even if the Balancing Threshold is met.

假设我们为此指标保留三个均衡阈值。Let's say that we retain our Balancing Threshold of three for this metric. 另外假设具有 1536 个活动阈值。Let's also say we have an Activity Threshold of 1536. 在第一种情况下,根据均衡阈值,群集为不均衡状态,但没有节点符合活动阈值,因此保持现状。In the first case, while the cluster is imbalanced per the Balancing Threshold there's no node meets that Activity Threshold, so nothing happens. 在底部的示例中,Node1 超过活动阈值。In the bottom example, Node1 is over the Activity Threshold. 由于同时超过了指标的均衡阈值和活动阈值,所以计划进行均衡。Since both the Balancing Threshold and the Activity Threshold for the metric are exceeded, balancing is scheduled. 有关示例,请看下图:As an example, let's look at the following diagram:

活动阈值示例

如同平衡阈值,活动阈值通过群集定义根据每个指标进行定义:Just like Balancing Thresholds, Activity Thresholds are defined per-metric via the cluster definition:

ClusterManifest.xmlClusterManifest.xml

    <Section Name="MetricActivityThresholds">
      <Parameter Name="Memory" Value="1536"/>
    </Section>

通过用于独立部署的 ClusterConfig.json 或用于 Azure 托管群集的 Template.json:via ClusterConfig.json for Standalone deployments or Template.json for Azure hosted clusters:

"fabricSettings": [
  {
    "name": "MetricActivityThresholds",
    "parameters": [
      {
          "name": "Memory",
          "value": "1536"
      }
    ]
  }
]

均衡和活动阈值都绑定到具体指标,只有在同一个指标的均衡阈值和活动阈值都超过时才触发均衡。Balancing and activity thresholds are both tied to a specific metric - balancing is triggered only if both the Balancing Threshold and Activity Threshold is exceeded for the same metric.

Note

如未指定,则指标的均衡阈值为 1,活动阈值为 0。When not specified, the Balancing Threshold for a metric is 1, and the Activity Threshold is 0. 这表示对于任何给定的负载,群集资源管理器将尝试使该指标保持完美平衡。This means that the Cluster Resource Manager will try to keep that metric perfectly balanced for any given load. 如果正在使用自定义指标,则建议显式定义指标的均衡和活动阈值。If you are using custom metrics it is recommended that you explicitly define your own balancing and activity thresholds for your metrics.

一起平衡服务Balancing services together

群集是否非均衡是从整个群集来看。Whether the cluster is imbalanced or not is a cluster-wide decision. 但解决这种情况的方法是移动单个服务副本和实例。However, the way we go about fixing it is moving individual service replicas and instances around. 这种说法很合理,是吗?This makes sense, right? 如果内存堆积在某一个节点上,可能是由多个副本或实例造成的。If memory is stacked up on one node, multiple replicas or instances could be contributing to it. 修复不均衡需要移动所有使用不均衡指标的有状态副本或无状态实例。Fixing the imbalance could require moving any of the stateful replicas or stateless instances that use the imbalanced metric.

但本身不均衡的服务偶尔会移动(请记住之前有关局部权重和全局权重的讨论)。Occasionally though, a service that wasn't itself imbalanced gets moved (remember the discussion of local and global weights earlier). 为什么某服务的所有指标均衡时,该服务会移动?Why would a service get moved when all that service's metrics were balanced? 请看以下示例:Let's see an example:

  • 假设有四个服务:Service1、Service2、Service3 及 Service4。Let's say there are four services, Service1, Service2, Service3, and Service4.
  • Service1 报告指标 Metric1 和 Metric2。Service1 reports metrics Metric1 and Metric2.
  • Service2 报告指标 Metric2 和 Metric3。Service2 reports metrics Metric2 and Metric3.
  • Service3 报告指标 Metric3 和 Metric4。Service3 reports metrics Metric3 and Metric4.
  • Service4 报告指标 Metric99。Service4 reports metric Metric99.

可以看到此处的运行情况:这里是一个链条!Surely you can see where we're going here: There's a chain! 我们没有 4 个独立的服务,我们拥有 3 个相关的服务以及 1 个独立的服务。We don't really have four independent services, we have three services that are related and one that is off on its own.

一起平衡服务

由于此链条,指标 1-4 不均衡可能会导致属于服务 1-3 的副本或实例四处移动。Because of this chain, it's possible that an imbalance in metrics 1-4 can cause replicas or instances belonging to services 1-3 to move around. 此外,指标 1、2 或 3 不均衡一定不会在 Service4 中引起移动。We also know that an imbalance in Metrics 1, 2, or 3 can't cause movements in Service4. 因为移动属于 Service4 的副本或实例绝对不会影响指标 1-3 的均衡,所以这样做毫无意义。There would be no point since moving the replicas or instances belonging to Service4 around can do absolutely nothing to impact the balance of Metrics 1-3.

群集资源管理器会自动计算出哪些服务是相关的。The Cluster Resource Manager automatically figures out what services are related. 添加、移除或更改服务的指标会影响服务间的关系。Adding, removing, or changing the metrics for services can impact their relationships. 例如,在两次运行均衡之间,Service2 可能已经更新为删除 Metric2。For example, between two runs of balancing Service2 may have been updated to remove Metric2. 这会中断 Service1 和 Service2 之间的链接。This breaks the chain between Service1 and Service2. 现在有三组相关服务,而不是两组:Now instead of two groups of related services, there are three:

一起平衡服务

后续步骤Next steps

  • 指标是 Service Fabric 群集资源管理器在群集中管理消耗和容量的方式。Metrics are how the Service Fabric Cluster Resource Manger manages consumption and capacity in the cluster. 若要详细了解指标及其配置方式,请查看此文To learn more about metrics and how to configure them, check out this article
  • 移动成本是向群集 Resource Manager 发出信号,表示移动某些服务比移动其他服务会产生更高成本的方式之一。Movement Cost is one way of signaling to the Cluster Resource Manager that certain services are more expensive to move than others. 若要详细了解移动成本,请参阅此文For more about movement cost, refer to this article
  • 群集 Resource Manager 提供多个限制机制,可以配置这些限制机制,以减慢群集中的流动。The Cluster Resource Manager has several throttles that you can configure to slow down churn in the cluster. 这些限制通常不是必要的,但如果需要,可以在此处了解其相关信息They're not normally necessary, but if you need them you can learn about them here
  • 群集资源管理器可以识别并处理子群集(使用放置约束和均衡时有时会出现这种情况)。The Cluster Resource Manager can recognize and handle subclustering (a situation that sometimes arises when you use placement constraints and balancing). 若要了解子群集如何影响均衡以及如何处理它,请参阅此文To learn how subclustering can affect balancing and how you can handle it, see here