Service Fabric 服务的放置策略Placement policies for service fabric services

放置策略是可用于在某些不常见的特定情况下控制服务位置的附加规则。Placement policies are additional rules that can be used to govern service placement in some specific, less-common scenarios. 这些情况的一些示例包括:Some examples of those scenarios are:

  • Service Fabric 群集会跨越地理距离(如多个本地数据中心)或跨 Azure 区域Your Service Fabric cluster spans geographic distances, such as multiple on-premises datacenters or across Azure regions
  • 环境跨多个地缘政治控制区域或法定控制区域,或一些其他情况:有要强制实施的政策边界Your environment spans multiple areas of geopolitical or legal control, or some other case where you have policy boundaries you need to enforce
  • 由于远距离或者使用速度较慢或可靠性较低的网络链接,需要考虑通信性能或延迟There are communication performance or latency considerations due to large distances or use of slower or less reliable network links
  • 在并置某些工作负载时,需要尽量使其与其他工作负载一起,或者靠近客户You need to keep certain workloads collocated as a best effort, either with other workloads or in proximity to customers
  • 在单个节点上需要一个分区的多个无状态实例You need multiple stateless instances of a partition on a single node

大多数这些要求与群集的物理布局一致,以集群的容错域表示。Most of these requirements align with the physical layout of the cluster, represented as the fault domains of the cluster.

可以帮助处理这些方案的高级放置策略包括:The advanced placement policies that help address these scenarios are:

  1. 无效域Invalid domains
  2. 所需域Required domains
  3. 首选域Preferred domains
  4. 不允许副本打包Disallowing replica packing
  5. 允许节点上存在多个无状态实例Allow multiple stateless instances on node

以下大多数控件都能通过节点属性和放置约束来配置,但有一些控件比较复杂。Most of the following controls could be configured via node properties and placement constraints, but some are more complicated. 为了使操作更简单,Service Fabric 群集 Resource Manager 提供了这些附加放置策略。To make things simpler, the Service Fabric Cluster Resource Manager provides these additional placement policies. 每个命名服务实例配置了放置策略。Placement policies are configured on a per-named service instance basis. 还可以进行动态更新。They can also be updated dynamically.

指定无效域Specifying invalid domains

凭借 InvalidDomain 放置策略,可以指定某个特定容错域对特定服务是无效的。The InvalidDomain placement policy allows you to specify that a particular Fault Domain is invalid for a specific service. 此策略可确保特定的服务绝对不会在特定的区域中运行(例如,出于地缘政治或公司政策的原因)。This policy ensures that a particular service never runs in a particular area, for example for geopolitical or corporate policy reasons. 可以通过单独的策略指定多个无效域。Multiple invalid domains may be specified via separate policies.



ServicePlacementInvalidDomainPolicyDescription invalidDomain = new ServicePlacementInvalidDomainPolicyDescription();
invalidDomain.DomainName = "fd:/DCEast"; //regulations prohibit this workload here


New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("InvalidDomain,fd:/DCEast")

指定所需域Specifying required domains

所需的域放置策略要求服务仅存在于指定域中。The required domain placement policy requires that the service is present only in the specified domain. 可以通过独立的策略指定多个所需域。Multiple required domains can be specified via separate policies.



ServicePlacementRequiredDomainPolicyDescription requiredDomain = new ServicePlacementRequiredDomainPolicyDescription();
requiredDomain.DomainName = "fd:/DC01/RK03/BL2";


New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("RequiredDomain,fd:/DC01/RK03/BL2")

指定有状态服务主要副本的首选域Specifying a preferred domain for the primary replicas of a stateful service

首选主域指定放置主要副本的容错域。The Preferred Primary Domain specifies the fault domain to place the Primary in. 如果一切运行正常,主副本最终会在此域中。The Primary ends up in this domain when everything is healthy. 如果域或主要副本出现故障或关闭,则主要副本会移到其他位置,理想情况下为同一域中的其他位置。If the domain or the Primary replica fails or shuts down, the Primary moves to some other location, ideally in the same domain. 如果这个新位置不在首选域中,群集 Resource Manager 将尽可能快地将它移回到首选域。If this new location isn't in the preferred domain, the Cluster Resource Manager moves it back to the preferred domain as soon as possible. 当然,此设置仅适用于有状态服务。Naturally this setting only makes sense for stateful services. 对于跨越 Azure 区域或多个数据中心的群集,如果该群集的服务希望放置在某个位置,此策略最有用。This policy is most useful in clusters that are spanned across Azure regions or multiple datacenters but have services that prefer placement in a certain location. 使主要副本接近其用户或其他服务有助于降低延迟,尤其是对于默认由主要副本处理的读取操作。Keeping Primaries close to their users or other services helps provide lower latency, especially for reads, which are handled by Primaries by default.


ServicePlacementPreferPrimaryDomainPolicyDescription primaryDomain = new ServicePlacementPreferPrimaryDomainPolicyDescription();
primaryDomain.DomainName = "fd:/ChinaEast/";


New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("PreferredPrimaryDomain,fd:/ChinaEast")

需要副本分发并禁止封装Requiring replica distribution and disallowing packing

群集正常运行时,副本通常分布在容错域和升级域中。Replicas are normally distributed across fault and upgrade domains when the cluster is healthy. 但是,存在给定分区的多个副本最终可能会暂时打包到单个域中的情况。However, there are cases where more than one replica for a given partition may end up temporarily packed into a single domain. 例如,假设群集有九个节点在三个容错域(fd:/0、fd:/1 和 fd:/2)中。For example, let's say that the cluster has nine nodes in three fault domains, fd:/0, fd:/1, and fd:/2. 再假设服务具有三个副本。Let's also say that your service has three replicas. 假设 fd:/1 和 fd:/2 中用于这些副本的节点发生故障。Let's say that the nodes that were being used for those replicas in fd:/1 and fd:/2 went down. 群集资源管理器通常会首选这些相同容错域中的其他节点。Normally the Cluster Resource Manager would prefer other nodes in those same fault domains. 在此例中,假设由于容量问题这些域中的其他节点都无效。In this case, let's say due to capacity issues none of the other nodes in those domains were valid. 如果群集 Resource Manager 生成这些副本的替换位置,它只能选择 fd:/0 中的节点。If the Cluster Resource Manager builds replacements for those replicas, it would have to choose nodes in fd:/0. 但是,执行该操作就造成了违反容错域约束的情况。However, doing that creates a situation where the Fault Domain constraint is violated. 打包副本会增加整个副本集发生故障或丢失的可能性。Packing replicas increases the chance that the whole replica set could go down or be lost.


有关约束和约束优先级的其他一般信息,请参阅此主题For more information on constraints and constraint priorities generally, check out this topic.

如果曾经看到过类似“The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: FaultDomain”的运行状况消息,则可能已遇到这种情况或类似情况。If you've ever seen a health message such as "The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: FaultDomain", then you've hit this condition or something like it. 通常只会暂时将一个或两个副本打包在一起。Usually only one or two replicas are packed together temporarily. 只要少于给定域中的大多数副本就是安全的。So long as there are fewer than a quorum of replicas in a given domain, you're safe. 打包的情况很少发生,但它可能发生,而且通常这些情况是暂时性的,因为这些节点会恢复正常。Packing is rare, but it can happen, and usually these situations are transient since the nodes come back. 如果这些节点确实一直处于关闭状态,并且群集 Resource Manager 需要生成替换位置,通常最适合的容错域中有其他可用节点。If the nodes do stay down and the Cluster Resource Manager needs to build replacements, usually there are other nodes available in the ideal fault domains.

某些工作负荷会愿意始终具有目标副本数,即使将它们打包到更少的域,也是如此。Some workloads would prefer always having the target number of replicas, even if they are packed into fewer domains. 这些工作负荷押注不会全部同时出现永久性域故障,并且通常可以恢复本地状态。These workloads are betting against total simultaneous permanent domain failures and can usually recover local state. 其他工作负荷则宁可早停机也不愿冒数据不正确或数据丢失的风险。Other workloads would rather take the downtime earlier than risk correctness or loss of data. 大多数生产工作负荷运行超过三个副本、超过三个容错域,以及每个容错域的多个有效节点。Most production workloads run with more than three replicas, more than three fault domains, and many valid nodes per fault domain. 因此,默认情况下默认行为允许域打包。Because of this, the default behavior allows domain packing by default. 默认行为允许常规均衡和故障转移处理这些极端情况,即使这意味着要进行临时域打包。The default behavior allows normal balancing and failover to handle these extreme cases, even if that means temporary domain packing.

如果要对给定工作负荷禁用此类打包,可以在服务中指定 RequireDomainDistribution 策略。If you want to disable such packing for a given workload, you can specify the RequireDomainDistribution policy on the service. 设置此策略后,群集资源管理器可确保同一容错域或升级域中不存在同一分区的两个副本。When this policy is set, the Cluster Resource Manager ensures no two replicas from the same partition run in the same fault or upgrade domain.


ServicePlacementRequireDomainDistributionPolicyDescription distributeDomain = new ServicePlacementRequireDomainDistributionPolicyDescription();


New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateful -MinReplicaSetSize 3 -TargetReplicaSetSize 3 -PartitionSchemeSingleton -PlacementPolicy @("RequiredDomainDistribution")

现在,是否可针对未跨越地理区域的群集中的服务使用这些配置?Now, would it be possible to use these configurations for services in a cluster that was not geographically spanned? 可以使用,但也没有充分的理由。You could, but there's not a great reason too. 应避免使用必需域、无效域和首选域配置,除非方案需要。The required, invalid, and preferred domain configurations should be avoided unless the scenarios require them. 尝试强制给定工作负荷在单个机架中运行,或优先选择本地群集的某段没有任何意义。It doesn't make any sense to try to force a given workload to run in a single rack, or to prefer some segment of your local cluster over another. 应在容错域之间分布不同的硬件配置,并通过标准放置约束和节点属性处理这些配置。Different hardware configurations should be spread across fault domains and handled via normal placement constraints and node properties.

在单个节点上放置一个分区的多个无状态实例Placement of multiple stateless instances of a partition on single node

AllowMultipleStatelessInstancesOnNode 放置策略允许在单个节点上放置一个分区的多个无状态实例。The AllowMultipleStatelessInstancesOnNode placement policy allows placement of multiple stateless instances of a partition on a single node. 默认情况下,不能在一个节点上放置单个分区的多个实例。By default, multiple instances of a single partition cannot be placed on a node. 即使使用 -1 服务,对于给定的命名服务,也不可能将实例数扩展到群集中的节点数以上。Even with a -1 service, it is not possible to scale the number of instances beyond the number of nodes in the cluster, for a given named service. 此放置策略将删除此限制,并允许将 InstanceCount 指定为大于节点数的值。This placement policy removes this restriction and allows InstanceCount to be specified higher than node count.

如果曾经看到过类似“The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: ReplicaExclusion”的运行状况消息,则可能已遇到这种情况或类似情况。If you've ever seen a health message such as "The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: ReplicaExclusion", then you've hit this condition or something like it.

通过在服务上指定 AllowMultipleStatelessInstancesOnNode 策略,可以将 InstanceCount 设置为大于群集中的节点数的值。By specifying the AllowMultipleStatelessInstancesOnNode policy on the service, InstanceCount can be set beyond the number of nodes in the cluster.


ServicePlacementAllowMultipleStatelessInstancesOnNodePolicyDescription allowMultipleInstances = new ServicePlacementAllowMultipleStatelessInstancesOnNodePolicyDescription();


New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName -Stateless -PartitionSchemeSingleton -PlacementPolicy @("AllowMultipleStatelessInstancesOnNode") -InstanceCount 10 -ServicePackageActivationMode ExclusiveProcess 


放置策略当前为预览版,位于 EnableUnsupportedPreviewFeatures 群集设置后。The placement policy is currently in preview and behind the EnableUnsupportedPreviewFeatures cluster setting. 由于这目前是预览功能,因此设置预览配置会阻止到/从该群集的升级。Since this is a preview feature for now, setting the preview config prevents the cluster from getting upgraded to/from. 换句话说,你需要创建一个新群集来试用该功能。In other words, you will need to create a new cluster to try out the feature.


目前,仅处于 ExclusiveProcess 服务包激活模式下的无状态服务支持该策略。Currently the policy is only supported for Stateless services with ExclusiveProcess service package activation mode.


与静态端口终结点一起使用时,不支持该策略。The policy is not supported when used with static port endpoints. 将两者结合使用可能导致群集运行不正常,因为同一节点上的多个实例尝试绑定到同一端口,但无法启动。Using both in conjunction can lead to an unhealthy cluster as multiple instances on the same node try to bind to the same port, and cannot come up.

后续步骤Next steps