Service Fabric 服务的放置策略Placement policies for service fabric services

放置策略是可用于在某些不常见的特定情况下控制服务位置的附加规则。Placement policies are additional rules that can be used to govern service placement in some specific, less-common scenarios. 这些情况可能包括:Some examples of those scenarios are:

  • Service Fabric 群集跨越地理距离(如多个本地数据中心)或跨 Azure 区域Your Service Fabric cluster spans geographic distances, such as multiple on-premises datacenters or across Azure regions
  • 环境跨多个地缘政治控制区域或法定控制区域,或一些其他情况:有要强制实施的政策边界Your environment spans multiple areas of geopolitical or legal control, or some other case where you have policy boundaries you need to enforce
  • 由于远距离或者使用速度较慢或可靠性较低的网络链接,需要考虑通信性能或延迟There are communication performance or latency considerations due to large distances or use of slower or less reliable network links
  • 在并置某些工作负载时,需要尽量使其与其他工作负载一起,或者靠近客户You need to keep certain workloads collocated as a best effort, either with other workloads or in proximity to customers

大多数这些要求与群集的物理布局一致,以集群的容错域表示。Most of these requirements align with the physical layout of the cluster, represented as the fault domains of the cluster.

可以帮助处理这些方案的高级放置策略包括:The advanced placement policies that help address these scenarios are:

  1. 无效域Invalid domains
  2. 所需域Required domains
  3. 首选域Preferred domains
  4. 不允许副本打包Disallowing replica packing

以下大部分控制条件都能通过节点属性和放置约束进行配置,但某些比较复杂。Most of the following controls could be configured via node properties and placement constraints, but some are more complicated. 为方便起见,Service Fabric 群集 Resource Manager 提供了这些附加的放置策略。To make things simpler, the Service Fabric Cluster Resource Manager provides these additional placement policies. 每个命名服务实例配置了放置策略。Placement policies are configured on a per-named service instance basis. 还可以进行动态更新。They can also be updated dynamically.

指定无效域Specifying invalid domains

凭借 InvalidDomain 放置策略,可以指定某个特定容错域对特定服务是无效的。The InvalidDomain placement policy allows you to specify that a particular Fault Domain is invalid for a specific service. 此策略可确保特定的服务绝对不会在特定的区域中运行(例如,出于地缘政治或公司政策的原因)。This policy ensures that a particular service never runs in a particular area, for example for geopolitical or corporate policy reasons. 可以通过单独的策略指定多个无效域。Multiple invalid domains may be specified via separate policies.

无效域示例

代码:Code:

ServicePlacementInvalidDomainPolicyDescription invalidDomain = new ServicePlacementInvalidDomainPolicyDescription();
invalidDomain.DomainName = "fd:/DCEast"; //regulations prohibit this workload here
serviceDescription.PlacementPolicies.Add(invalidDomain);

PowerShell:PowerShell:

New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("InvalidDomain,fd:/DCEast")

指定所需域Specifying required domains

所需的域放置策略要求服务仅存在于指定域中。The required domain placement policy requires that the service is present only in the specified domain. 可以通过单独的策略指定多个所需域。Multiple required domains can be specified via separate policies.

所需域示例

代码:Code:

ServicePlacementRequiredDomainPolicyDescription requiredDomain = new ServicePlacementRequiredDomainPolicyDescription();
requiredDomain.DomainName = "fd:/DC01/RK03/BL2";
serviceDescription.PlacementPolicies.Add(requiredDomain);

PowerShell:PowerShell:

New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("RequiredDomain,fd:/DC01/RK03/BL2")

指定有状态服务主要副本的首选域Specifying a preferred domain for the primary replicas of a stateful service

首选主域指定放置主要副本的容错域。The Preferred Primary Domain specifies the fault domain to place the Primary in. 如果一切运行正常,主要副本最终位于此域中。The Primary ends up in this domain when everything is healthy. 如果域或主要副本出现故障或关闭,则主要副本会移到其他位置,理想情况下为同一域中的其他位置。If the domain or the Primary replica fails or shuts down, the Primary moves to some other location, ideally in the same domain. 如果此新位置不在首选域中,群集 Resource Manager 会尽快将主要副本移回首选域。If this new location isn't in the preferred domain, the Cluster Resource Manager moves it back to the preferred domain as soon as possible. 当然,此设置仅适用于有状态服务。Naturally this setting only makes sense for stateful services. 对于跨越 Azure 区域或多个数据中心的群集,如果该群集的服务希望放置在某个位置,此策略最有用。This policy is most useful in clusters that are spanned across Azure regions or multiple datacenters but have services that prefer placement in a certain location. 使主要副本接近其用户或其他服务有助于降低延迟,尤其是对于默认由主要副本处理的读取操作。Keeping Primaries close to their users or other services helps provide lower latency, especially for reads, which are handled by Primaries by default.

首选主域和故障转移

ServicePlacementPreferPrimaryDomainPolicyDescription primaryDomain = new ServicePlacementPreferPrimaryDomainPolicyDescription();
primaryDomain.DomainName = "fd:/ChinaEast/";
serviceDescription.PlacementPolicies.Add(primaryDomain);

PowerShell:PowerShell:

New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("PreferredPrimaryDomain,fd:/ChinaEast")

需要副本分发并禁止封装Requiring replica distribution and disallowing packing

群集正常运行时,副本通常分布在容错域和升级域中 。Replicas are normally distributed across fault and upgrade domains when the cluster is healthy. 但是,存在给定分区的多个副本最终可能会暂时打包到单个域中的情况。However, there are cases where more than one replica for a given partition may end up temporarily packed into a single domain. 例如,假设群集有九个节点在三个容错域(fd:/0、fd:/1 和 fd:/2)中。For example, let's say that the cluster has nine nodes in three fault domains, fd:/0, fd:/1, and fd:/2. 再假设服务具有三个副本。Let's also say that your service has three replicas. 假设 fd:/1 和 fd:/2 中用于这些副本的节点已关闭。Let's say that the nodes that were being used for those replicas in fd:/1 and fd:/2 went down. 群集资源管理器通常会首选这些相同容错域中的其他节点。Normally the Cluster Resource Manager would prefer other nodes in those same fault domains. 在这种情况下,假设由于容量问题,这些域中的其他任何节点都无效。In this case, let's say due to capacity issues none of the other nodes in those domains were valid. 如果群集 Resource Manager 要为这些副本生成替代项,则必须在 fd:/0 中选择节点。If the Cluster Resource Manager builds replacements for those replicas, it would have to choose nodes in fd:/0. 但是,执行该操作就造成了违反容错域约束的情况 。However, doing that creates a situation where the Fault Domain constraint is violated. 打包副本会增加整个副本集发生故障或丢失的可能性。Packing replicas increases the chance that the whole replica set could go down or be lost.

备注

有关约束和约束优先级的其他一般信息,请参阅此主题For more information on constraints and constraint priorities generally, check out this topic.

如果曾经看到过类似“The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: FaultDomain”的运行状况消息,则可能已遇到这种情况或类似情况。If you've ever seen a health message such as "The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: FaultDomain", then you've hit this condition or something like it. 通常只会暂时将一个或两个副本打包在一起。Usually only one or two replicas are packed together temporarily. 只要少于给定域中的大多数副本就是安全的。So long as there are fewer than a quorum of replicas in a given domain, you're safe. 打包的情况很少发生,但它可能发生,而且通常这些情况是暂时性的,因为这些节点会恢复正常。Packing is rare, but it can happen, and usually these situations are transient since the nodes come back. 如果节点持续关闭,并且群集 Resource Manager 需要生成替代项,说明正确的容错域中还有其他有效节点。If the nodes do stay down and the Cluster Resource Manager needs to build replacements, usually there are other nodes available in the ideal fault domains.

某些工作负荷会愿意始终具有目标副本数,即使将它们打包到更少的域,也是如此。Some workloads would prefer always having the target number of replicas, even if they are packed into fewer domains. 这些工作负荷打赌整个域不会同时发生永久故障,并且通常可以恢复本地状态。These workloads are betting against total simultaneous permanent domain failures and can usually recover local state. 其他工作负荷则偏向于提前停机,而不愿承受准确性和数据丢失等风险。Other workloads would rather take the downtime earlier than risk correctness or loss of data. 大多数生产工作负荷运行超过三个副本、超过三个容错域,以及每个容错域的多个有效节点。Most production workloads run with more than three replicas, more than three fault domains, and many valid nodes per fault domain. 因此,默认情况下默认行为允许域打包。Because of this, the default behavior allows domain packing by default. 默认行为允许常规均衡和故障转移处理这些极端情况,即使这意味着要进行临时域打包。The default behavior allows normal balancing and failover to handle these extreme cases, even if that means temporary domain packing.

如果要对给定工作负荷禁用此类打包,可以在服务中指定 RequireDomainDistribution 策略。If you want to disable such packing for a given workload, you can specify the RequireDomainDistribution policy on the service. 设置此策略后,群集资源管理器可确保同一容错域或升级域中不存在同一分区的两个副本。When this policy is set, the Cluster Resource Manager ensures no two replicas from the same partition run in the same fault or upgrade domain.

代码:Code:

ServicePlacementRequireDomainDistributionPolicyDescription distributeDomain = new ServicePlacementRequireDomainDistributionPolicyDescription();
serviceDescription.PlacementPolicies.Add(distributeDomain);

PowerShell:PowerShell:

New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("RequiredDomainDistribution")

现在,是否可针对不跨越地理区域的群集中的服务使用这些配置?Now, would it be possible to use these configurations for services in a cluster that was not geographically spanned? 可以使用,但也没有充分的理由。You could, but there's not a great reason too. 应避免使用必需域、无效域和首选域配置,除非方案需要它们。The required, invalid, and preferred domain configurations should be avoided unless the scenarios require them. 强制特定工作负荷在单个机架上运行,或者优先使用本地群集上某些段并没有太大意义。It doesn't make any sense to try to force a given workload to run in a single rack, or to prefer some segment of your local cluster over another. 应在容错域之间分布不同的硬件配置,并通过标准放置约束和节点属性处理这些配置。Different hardware configurations should be spread across fault domains and handled via normal placement constraints and node properties.

后续步骤Next steps