升级和更新 Azure Service Fabric 群集Upgrading and updating an Azure Service Fabric cluster

对于任何新式系统而言,为可升级性做好规划是实现产品长期成功的关键所在。For any modern system, designing for upgradability is key to achieving long-term success of your product. Azure Service Fabric 群集是你拥有的,但部分由 Azure 管理的资源。An Azure Service Fabric cluster is a resource that you own, but is partly managed by Azure. 本文说明自动管理的项目以及可以自行配置的项目。This article describes what is managed automatically and what you can configure yourself.

控制在群集上运行的结构版本Controlling the fabric version that runs on your cluster

请确保群集始终运行受支持的 Fabric 版本Make sure your cluster is always running a supported fabric version. 每次 Azure 宣布发布新版 Service Fabric 即标志着自该日期起至少 60 天以后结束对旧版本的支持。Each time Azure announces the release of a new version of Service Fabric, the previous version is marked for end of support after a minimum of 60 days from that date. 新版本将在 Service Fabric 团队博客中宣布。New releases are announced on the Service Fabric team blog.

群集运行的版本过期前 14 天,系统会生成运行状况事件,使群集进入警告运行状况状态。14 days prior to the expiry of the release your cluster is running, a health event is generated that puts your cluster into a warning health state. 在升级到支持的结构版本之前,群集将保持警告状态。The cluster remains in a warning state until you upgrade to a supported fabric version.

可以将群集设置为 Azure 发布自动结构升级时接收该升级,也可以选择想要群集安装的受支持结构版本。You can set your cluster to receive automatic fabric upgrades as they are released by Azure or you can select a supported fabric version you want your cluster to be on. 若要了解详细信息,请参阅升级群集的 Service Fabric 版本To learn more, read upgrade the Service Fabric version of your cluster.

自动升级期间的结构升级行为Fabric upgrade behavior during automatic upgrades

Azure 将维护 Azure 群集中运行的结构代码和配置。Azure maintains the fabric code and configuration that runs in an Azure cluster. 我们根据需要,对软件执行受监视的自动升级。We perform automatic monitored upgrades to the software on an as-needed basis. 升级的部分可能是代码和/或配置。These upgrades could be code, configuration, or both. 为了确保应用程序不受这些升级的影响或者将影响降到最低,将按以下阶段执行升级:To make sure that your application suffers no impact or minimal impact due to these upgrades, upgrades are performed in the following phases:

阶段 1:使用所有群集运行状况策略执行升级Phase 1: An upgrade is performed by using all cluster health policies

在此阶段,升级过程将每次升级一个升级域,已在群集中运行的应用程序将继续运行,而不会造成任何停机时间。During this phase, the upgrades proceed one upgrade domain at a time, and the applications that were running in the cluster continue to run without any downtime. 升级过程中遵守群集运行状况策略(适用于节点运行状况和应用程序运行状况)。The cluster health policies (for node health and application health) are adhered to during the upgrade.

如果不符合现行的群集运行状况策略,则回退升级,并将电子邮件发送给订阅的所有者。If the cluster health policies are not met, the upgrade is rolled back and an email is sent to the owner of the subscription. 电子邮件中包含以下信息:The email contains the following information:

  • 有关必须回滚群集升级的通知。Notification that we had to roll back a cluster upgrade.
  • 建议的补救措施(如果有)。Suggested remedial actions, if any.
  • 距离执行阶段 2 的天数 (n)。The number of days (n) until we execute Phase 2.

如果有任何升级因为基础结构方面的原因而失败,我们应尝试多次执行同一升级。We try to execute the same upgrade a few more times in case any upgrades failed for infrastructure reasons. 自电子邮件发送日期的 n 天之后,我们将继续执行阶段 2。After the n days from the date the email was sent, we proceed to Phase 2.

如果符合群集运行状况策略,则升级被视为成功并标记为完成。If the cluster health policies are met, the upgrade is considered successful and marked complete. 在此阶段进行初始升级或重新运行任何升级期间,可能发生这种情形。This can happen during the initial upgrade or any of the upgrade reruns in this phase. 如果运行成功,不会发送任何电子邮件确认。There is no email confirmation of a successful run. 这是为了避免发送过多的电子邮件;收到电子邮件则表示出现异常。This is to avoid sending you too many emails; receiving an email should be seen as an exception to normal. 大多数群集升级预期都会成功,且不影响应用程序可用性。We expect most of the cluster upgrades to succeed without impacting your application availability.

阶段 2:仅使用默认运行状况策略执行升级Phase 2: An upgrade is performed by using default health policies only

在此阶段设置好运行状况策略,以便在升级开始时运行正常的应用程序数目在升级程序期间保持不变。The health policies in this phase are set in such a way that the number of applications that were healthy at the beginning of the upgrade remains the same for the duration of the upgrade process. 与阶段 1 一样,阶段 2 升级过程将每次升级一个升级域,已在群集中运行的应用程序将继续运行,而不会造成任何停机时间。As in Phase 1, the Phase 2 upgrades proceed one upgrade domain at a time, and the applications that were running in the cluster continue to run without any downtime. 在升级期间,将遵守群集运行状况策略(节点运行状况和所有在群集中运行的应用程序的运行状况的组合)。The cluster health policies (a combination of node health and the health all the applications running in the cluster) are adhered to for the duration of the upgrade.

如果不符合现行的群集运行状况策略,则回滚升级。If the cluster health policies in effect are not met, the upgrade is rolled back. 然后,系统会向订阅所有者发送一封电子邮件。Then an email is sent to the owner of the subscription. 电子邮件中包含以下信息:The email contains the following information:

  • 有关必须回滚群集升级的通知。Notification that we had to roll back a cluster upgrade.
  • 建议的补救措施(如果有)。Suggested remedial actions, if any.
  • 距离执行阶段 3 的天数 (n)。The number of days (n) until we execute Phase 3.

如果有任何升级因为基础结构方面的原因而失败,我们应尝试多次执行同一升级。We try to execute the same upgrade a few more times in case any upgrades failed for infrastructure reasons. 会在 n 天结束前的几天发送提醒电子邮件。A reminder email is sent a couple of days before n days are up. 自电子邮件发送日期的 n 天之后,我们将继续执行阶段 3。After the n days from the date the email was sent, we proceed to Phase 3. 必须认真看待阶段 2 发送的电子邮件并采取补救措施。The emails we send you in Phase 2 must be taken seriously and remedial actions must be taken.

如果符合群集运行状况策略,则升级被视为成功并标记为完成。If the cluster health policies are met, the upgrade is considered successful and marked complete. 在此阶段进行初始升级或重新运行任何升级期间,可能发生这种情形。This can happen during the initial upgrade or any of the upgrade reruns in this phase. 如果运行成功,不会发送任何电子邮件确认。There is no email confirmation of a successful run.

阶段 3:使用积极的群集运行状况策略执行升级Phase 3: An upgrade is performed by using aggressive health policies

此阶段中的这些运行状况策略旨在升级完成,而不反映应用程序的运行状况。These health policies in this phase are geared towards completion of the upgrade rather than the health of the applications. 很少有群集升级在此阶段结束。Few cluster upgrades end up in this phase. 如果群集进入此阶段,则表示应用程序有可能变得不正常和/或失去可用性。If your cluster gets to this phase, there is a good chance that your application becomes unhealthy and/or lose availability.

类似于另外两个阶段,阶段 3 每次升级一个升级域。Similar to the other two phases, Phase 3 upgrades proceed one upgrade domain at a time.

如果不符合现行的群集运行状况策略,则回滚升级。If the cluster health policies are not met, the upgrade is rolled back. 如果有任何升级因为基础结构方面的原因而失败,我们应尝试多次执行同一升级。We try to execute the same upgrade a few more times in case any upgrades failed for infrastructure reasons. 之后,便会锁定群集,使它不再接收支持和/或升级。After that, the cluster is pinned, so that it will no longer receive support and/or upgrades.

系统会将包含此信息以及补救措施的电子邮件发送给订阅所有者。An email with this information is sent to the subscription owner, along with the remedial actions. 预期不会有任何群集遇到阶段 3 失败的状况。We do not expect any clusters to get into a state where Phase 3 has failed.

如果符合群集运行状况策略,则升级被视为成功并标记为完成。If the cluster health policies are met, the upgrade is considered successful and marked complete. 在此阶段进行初始升级或重新运行任何升级期间,可能发生这种情形。This can happen during the initial upgrade or any of the upgrade reruns in this phase. 如果运行成功,不会发送任何电子邮件确认。There is no email confirmation of a successful run.

管理证书Manage certificates

Service Fabric 使用创建群集时指定的 X.509 服务器证书,以保护群集节点之间的通信并对客户端进行身份验证。Service Fabric uses X.509 server certificates that you specify when you create a cluster to secure communications between cluster nodes and authenticate clients. 可以在 Azure 门户 中添加、更新或删除群集和客户端的证书,也可以使用 PowerShell/Azure CLI 完成这些操作。You can add, update, or delete certificates for the cluster and client in the Azure portal or using PowerShell/Azure CLI. 若要了解详细信息,请参阅添加或删除证书To learn more, read add or remove certificates

打开应用程序端口Open application ports

可以通过更改与节点类型关联的负载均衡器资源属性来更改应用程序端口。You can change application ports by changing the Load Balancer resource properties that are associated with the node type. 可以使用 Azure 门户,也可以使用 PowerShell/Azure CLI。You can use the Azure portal, or you can use PowerShell/Azure CLI. 有关详细信息,请参阅打开群集的应用程序端口For more information, read Open application ports for a cluster.

定义节点属性Define node properties

有时,可能需要确保仅在群集中特定类型的节点上运行某些工作负荷。Sometimes you may want to ensure that certain workloads run only on certain types of nodes in the cluster. 例如,某些工作负荷可能需要 GPU 或 SSD,而有些则不用。For example, some workload may require GPUs or SSDs while others may not. 对于群集中的每个节点类型,可以向群集节点添加自定义节点属性。For each of the node types in a cluster, you can add custom node properties to cluster nodes. 放置约束是附加到单个服务的语句,这些服务专供 1 个或多个节点属性选择。Placement constraints are the statements attached to individual services that select for one or more node properties. 放置约束定义服务运行的位置。Placement constraints define where services should run.

有关使用放置约束、节点属性以及如何定义它们的详细信息,请参阅节点属性和放置约束For details on the use of placement constraints, node properties, and how to define them, read node properties and placement constraints.

添加容量指标Add capacity metrics

对于每个节点类型,可以添加要在应用程序中用于报告负载的自定义容量度量值。For each of the node types, you can add custom capacity metrics that you want to use in your applications to report load. 有关使用容量指标来报告负载的详细信息,请参阅 Service Fabric 群集 Resource Manager 文档描述群集,以及指标和负载For details on the use of capacity metrics to report load, refer to the Service Fabric Cluster Resource Manager Documents on Describing Your Cluster and Metrics and Load.

设置自动升级的运行状况策略Set health policies for automatic upgrades

可以为结构升级指定自定义运行状况策略。You can specify custom health policies for fabric upgrade. 如果已将群集设置为“自动”结构升级,则这些策略将应用到自动结构升级的阶段 1。If you have set your cluster to Automatic fabric upgrades, then these policies get applied to the Phase-1 of the automatic fabric upgrades. 如果已将群集设置为“手动”结构升级,则每当选择新版本,因而触发系统开始在群集中进行结构升级时,就会应用这些策略。If you have set your cluster for Manual fabric upgrades, then these policies get applied each time you select a new version triggering the system to kick off the fabric upgrade in your cluster. 如果未重写这些策略,则使用默认值。If you do not override the policies, the defaults are used.

在“结构升级”边栏选项卡下面,可以选择高级升级设置来指定自定义运行状况策略,或者查看当前设置。You can specify the custom health policies or review the current settings under the "fabric upgrade" blade, by selecting the advanced upgrade settings. 请参考下图了解操作方法。Review the following picture on how to.

管理自定义运行状况策略

自定义群集的结构设置Customize Fabric settings for your cluster

可以在群集上自定义许多不同的配置设置,例如群集的可靠性级别和节点属性。Many different configuration settings can be customized on a cluster, such as the reliability level of the cluster and node properties. 有关详细信息,请参阅 Service Fabric 群集结构设置For more information, read Service Fabric cluster fabric settings.

修补群集节点的操作系统Patch the OS in the cluster nodes

修补业务流程应用程序 (POA) 是一个 Service Fabric 应用程序,可在 Service Fabric 群集中自动修补操作系统,而无需停机。The patch orchestration application (POA) is a Service Fabric application that automates operating system patching on a Service Fabric cluster without downtime. 适用于 Windows 的修补业务流程应用程序可部署在群集上,以便以协调一致的方式安装修补程序,同时使服务始终可用。The Patch Orchestration Application for Windows can be deployed on your cluster to install patches in an orchestrated manner while keeping the services available all the time.

后续步骤Next steps