Azure Service Fabric 中的重新配置Reconfiguration in Azure Service Fabric

配置定义为有状态服务的分区的副本及其角色。A configuration is defined as the replicas and their roles for a partition of a stateful service.

重新配置 是将一个配置迁移为另一个配置的过程。A reconfiguration is the process of moving one configuration to another configuration. 这会更改有状态服务的分区副本集。It makes a change to the replica set for a partition of a stateful service. 旧配置称为“先前配置 (PC)” ,新配置称为“当前配置 (CC)” 。The old configuration is called the previous configuration (PC), and the new configuration is called the current configuration (CC). Azure Service Fabric 中的重新配置协议可以在对副本集进行任何更改期间,保持一致性和可用性。The reconfiguration protocol in Azure Service Fabric preserves consistency and maintains availability during any changes to the replica set.

为了应对系统中的不同事件,故障转移管理器会启动重新配置。Failover Manager initiates reconfigurations in response to different events in the system. 例如,如果主要副本发生故障,重新配置会启动以将活动的次要副本升级为主要副本。For instance, if the primary fails then a reconfiguration is initiated to promote an active secondary to a primary. 再例如,在应用程序升级期间,可能有必要将主要副本移至另一节点,从而升级节点。Another example is in response to application upgrades when it might be necessary to move the primary to another node in order to upgrade the node.

重新配置类型Reconfiguration types

重新配置可以分为以下两种类型:Reconfigurations can be classified into two types:

  • 主要副本发生改变的重新配置:Reconfigurations where the primary is changing:

    • 故障转移:故障转移是为了应对正在运行的主要副本发生故障而启动的重新配置。Failover: Failovers are reconfigurations in response to the failure of a running primary.
    • 交换主要副本:在交换型重新配置中,Service Fabric 需要将正在运行的主要副本从一个节点移至另一个节点,通常是为了应对负载均衡或升级。SwapPrimary: Swaps are reconfigurations where Service Fabric needs to move a running primary from one node to another, usually in response to load balancing or an upgrade.
  • 主要副本未发生改变的重新配置。Reconfigurations where the primary is not changing.

重新配置阶段Reconfiguration phases

重新配置按以下几个阶段进行:A reconfiguration proceeds in several phases:

  • 阶段 0:此阶段在交换主要副本的重新配置期间发生,即当前主要副本将状态转让给新的主要副本,并转换为有效的次要副本。Phase0: This phase happens in swap-primary reconfigurations where the current primary transfers its state to the new primary and transitions to active secondary.

  • 阶段 1:此阶段在主要副本发生改变的重新配置期间发生。Phase1: This phase happens during reconfigurations where the primary is changing. 在此阶段,Service Fabric 在当前副本中确定正确的主要副本。During this phase, Service Fabric identifies the correct primary among the current replicas. 在交换主要副本的重新配置期间,无需执行此阶段,因为已经选择了新的主要副本。This phase is not needed during swap-primary reconfigurations because the new primary has already been chosen.

  • 阶段 2:在此阶段,Service Fabric 确保所有数据可用于当前配置的大多数副本。Phase2: During this phase, Service Fabric ensures that all data is available in a majority of the replicas of the current configuration.

其他几个阶段仅供内部使用。There are several other phases that are for internal use only.

重新配置已停止Stuck reconfigurations

重新配置无法执行 的原因有很多种。Reconfigurations can get stuck for a variety of reasons. 一些常见原因包括:Some of the common reasons include:

  • 副本故障:一些重新配置阶段要求配置中的大多数副本能够正常运行。Down replicas: Some reconfiguration phases require a majority of the replicas in the configuration to be up.
  • 网络或通信问题:重新配置要求在不同的节点之间建立网络连接。Network or communication problems: Reconfigurations require network connectivity between different nodes.
  • API 故障:重新配置协议要求服务实现完成特定的 API。API failures: The reconfiguration protocol requires that service implementations finish certain APIs. 例如,不履行可靠服务中的取消令牌会导致交换主要副本的重新配置无法运行。For example, not honoring the cancellation token in a reliable service causes SwapPrimary reconfigurations to get stuck.

使用系统组件(如 System.FM、System.RA、System.RAP)提供的运行状况报告,诊断在哪里无法运行重新配置。Use health reports from system components, such as System.FM, System.RA, and System.RAP, to diagnose where a reconfiguration is stuck. 系统健康状况报表页将介绍这些健康状况报表。The system health report page describes these health reports.

后续步骤Next steps

有关 Service Fabric 概念的详细信息,请参阅以下文章:For more information on Service Fabric concepts, see the following articles: