使用 SQL 数据库活动异地复制管理云应用程序的滚动升级Manage rolling upgrades of cloud applications by using SQL Database active geo-replication

适用于:是Azure SQL 数据库 APPLIES TO: yesAzure SQL Database

了解如何使用 Azure SQL 数据库中的活动异地复制来启用云应用程序的滚动升级。Learn how to use active geo-replication in Azure SQL Database to enable rolling upgrades of your cloud application. 由于升级是中断性操作,所以它应成为业务连续性规划和设计的一部分。Because upgrades are disruptive operations, they should be part of your business-continuity planning and design. 本文我们介绍了编排升级过程的两种不同方法,并讨论了每种方法的优点和缺点。In this article, we look at two different methods of orchestrating the upgrade process and discuss the benefits and tradeoffs of each option. 针对本文的目的,我们将参考一个应用程序,该应用程序包含一个连接到作为其数据层的单一数据库的网站。For the purposes of this article, we refer to an application that consists of a website that's connected to a single database as its data tier. 我们的目标是在不对用户体验产生任何重大影响的情况下将版本 1 (V1) 的应用程序升级到版本 2 (V2)。Our goal is to upgrade version 1 (V1) of the application to version 2 (V2) without any significant impact on the user experience.

评估升级选项时,请考虑以下因素:When evaluating upgrade options, consider these factors:

  • 升级过程中对应用程序可用性的影响,例如应用程序的功能可能被限制或降级多长时间。Impact on application availability during upgrades, such as how long application functions might be limited or degraded.
  • 在升级失败的情况下能否回滚。Ability to roll back if the upgrade fails.
  • 升级过程中发生灾难性错误时的应用程序的漏洞问题。Vulnerability of the application if an unrelated, catastrophic failure occurs during the upgrade.
  • 总的费用成本。Total dollar cost. 此因素包括升级过程使用的临时组件的额外数据库冗余成本和增量成本。This factor includes additional database redundancy and incremental costs of the temporary components used by the upgrade process.

升级依赖于数据库备份进行灾难恢复的应用程序Upgrade applications that rely on database backups for disaster recovery

如果应用程序依赖于自动的数据库备份,并且使用异地还原来实现灾难恢复,那么将它部署到单个 Azure 区域。If your application relies on automatic database backups and uses geo-restore for disaster recovery, it's deployed to a single Azure region. 为了尽量减少用户出现的中断情况,需在该区域创建一个过渡环境,其中包含升级过程中涉及的所有应用程序组件。To minimize user disruption, create a staging environment in that region with all the application components involved in the upgrade. 第一张图演示了在升级过程开始之前的操作环境。The first diagram illustrates the operational environment before the upgrade process. 终结点 contoso.chinacloudsites.cn 表示 Web 应用的生产环境。The endpoint contoso.chinacloudsites.cn represents a production environment of the web app. 若要回滚升级,必须使用数据库的完全同步副本创建过渡环境。To be able to roll back the upgrade, you must create a staging environment with a fully synchronized copy of the database. 遵循以下步骤创建一个用于升级的过渡环境:Follow these steps to create a staging environment for the upgrade:

  1. 在同一 Azure 区域中创建辅助数据库。Create a secondary database in the same Azure region. 监视此辅助数据库以查看种子设定过程是否已完成 (1)。Monitor the secondary to see if the seeding process is complete (1).
  2. 为 Web 应用创建新环境并将其命名为“过渡”。Create a new environment for your web app and call it 'Staging'. 它将在 URL 为 contoso-staging.chinacloudsites.cn 的 Azure DNS 中注册 (2)。It will be registered in Azure DNS with the URL contoso-staging.chinacloudsites.cn (2).

备注

这些准备步骤不会影响生产环境,该环境可以在完全访问模式下正常运行。These preparation steps won't impact the production environment, which can function in full-access mode.

可实现云灾难恢复的 SQL 数据库异地复制配置。

完成准备步骤后,应用程序就可以进行真正的升级了。When the preparation steps are complete, the application is ready for the actual upgrade. 下图演示了升级过程所涉及的步骤:The next diagram illustrates the steps involved in the upgrade process:

  1. 将主数据库设置为只读模式 (3)。Set the primary database to read-only mode (3). 此模式保证在升级过程中 Web 应用 (V1) 的生产环境将保持只读模式,从而避免 V1 和 V2 数据库实例之间出现数据分歧。This mode guarantees that the production environment of the web app (V1) remains read-only during the upgrade, thus preventing data divergence between the V1 and V2 database instances.
  2. 使用计划的终止模式断开辅助数据库的连接 (4)。Disconnect the secondary database by using the planned termination mode (4). 此操作会创建主数据库的完全同步独立副本。This action creates a fully synchronized, independent copy of the primary database. 将升级该数据库。This database will be upgraded.
  3. 将辅助数据库切换为读写模式,并运行升级脚本 (5)。Turn the secondary database to read-write mode and run the upgrade script (5).

可实现云灾难恢复的 SQL 数据库异地复制配置。

如果升级成功完成,则现在可将用户切换到应用程序的已升级副本,该副本现在是生产环境。If the upgrade finishes successfully, you're now ready to switch users to the upgraded copy the application, which becomes a production environment. 如下图所示,该切换涉及到其他几个步骤:Switching involves a few more steps, as illustrated in the next diagram:

  1. 激活在 Web 应用的生产环境和过渡环境之间进行的交换操作 (6)。Activate a swap operation between production and staging environments of the web app (6). 此操作切换两个环境的 URL。This operation switches the URLs of the two environments. 现在,contoso.chinacloudsites.cn 指向网站和数据库的 V2 版本(生产环境)。Now contoso.chinacloudsites.cn points to the V2 version of the web site and the database (production environment).
  2. 如果不再需要 V1 版本(此版本在交换后已成为一个过渡副本),可以解除过渡环境 (7)。If you no longer need the V1 version, which became a staging copy after the swap, you can decommission the staging environment (7).

可实现云灾难恢复的 SQL 数据库异地复制配置。

如果升级过程不成功(例如,由于升级脚本中出现错误),可认为过渡环境处于损坏状态。If the upgrade process is unsuccessful (for example, due to an error in the upgrade script), consider the staging environment to be compromised. 若要将应用程序回滚到升级前的状态,请将生产环境中的应用程序还原为完全访问模式。To roll back the application to the pre-upgrade state, revert the application in the production environment to full access. 下图显示了还原步骤:The next diagram shows the reversion steps:

  1. 将数据库副本设置为读写模式 (8)。Set the database copy to read-write mode (8). 此操作还原生产副本的完整 V1 功能。This action restores the full V1 functionality of the production copy.
  2. 执行根本原因分析并解除过渡环境 (9)。Perform the root-cause analysis and decommission the staging environment (9).

此时应用程序可完全正常运行,你可以重复上述升级步骤。At this point, the application is fully functional, and you can repeat the upgrade steps.

备注

回滚操作不需要 DNS 更改,因为尚未执行交换操作。The rollback doesn't require DNS changes because you did not yet perform a swap operation.

可实现云灾难恢复的 SQL 数据库异地复制配置。

此选项的主要优点是可以使用一系列简单步骤升级单个区域中的应用程序。The key advantage of this option is that you can upgrade an application in a single region by following a set of simple steps. 此升级的费用成本相对较低。The dollar cost of the upgrade is relatively low.

此方法的主要缺点在于,如果在升级过程中发生灾难性故障,那么恢复到升级前的状态将涉及在不同的区域重新部署应用程序,并且使用异地还原从备份中还原数据库。The main tradeoff is that, if a catastrophic failure occurs during the upgrade, the recovery to the pre-upgrade state involves redeploying the application in a different region and restoring the database from backup by using geo-restore. 此过程会导致很长的停机时间。This process results in significant downtime.

升级依赖于数据库异地复制进行灾难恢复的应用程序Upgrade applications that rely on database geo-replication for disaster recovery

如果应用程序使用活动异地复制或自动故障转移组来实现业务连续性,则会将该应用程序部署到至少两个不同的区域。If your application uses active geo-replication or auto-failover groups for business continuity, it's deployed to at least two different regions. 主要区域包含活动的主数据库,备份区域包含只读的辅助数据库。There's an active, primary database in a primary region and a read-only, secondary database in a backup region. 除了本文开头提到的因素以外,升级过程还必须保证:Along with the factors mentioned at the beginning of this article, the upgrade process must also guarantee that:

  • 在升级过程的任何时候都保护应用程序免受灾难性故障。The application remains protected from catastrophic failures at all times during the upgrade process.
  • 应用程序的异地冗余组件与活动组件一同升级。The geo-redundant components of the application are upgraded in parallel with the active components.

为了实现这些目标,除了使用 Web 应用环境以外,还需要通过包含一个活动终结点和一个备份终结点的故障转移配置文件来利用 Azure 流量管理器。To achieve these goals, in addition to using the Web Apps environments, you'll take advantage of Azure Traffic Manager by using a failover profile with one active endpoint and one backup endpoint. 下图演示了在升级过程开始之前的操作环境。The next diagram illustrates the operational environment prior to the upgrade process. 网站 contoso-1.chinacloudsites.cncontoso-dr.chinacloudsites.cn 表示具有完全地理冗余的应用程序的生产环境。The web sites contoso-1.chinacloudsites.cn and contoso-dr.chinacloudsites.cn represent a production environment of the application with full geographic redundancy. 生产环境包括以下组件:The production environment includes the following components:

  • 主要区域中的 Web 应用 contoso-1.chinacloudsites.cn 的生产环境 (1)The production environment of the web app contoso-1.chinacloudsites.cn in the primary region (1)
  • 主要区域中的主数据库 (2)The primary database in the primary region (2)
  • 备份区域中 Web 应用的后备实例 (3)A standby instance of the web app in the backup region (3)
  • 备份区域中的异地复制的辅助数据库 (4)The geo-replicated secondary database in the backup region (4)
  • Azure 流量管理器性能配置文件,包含名为 contoso-1.chinacloudsites.cn 的联机终结点和名为 contoso-dr.chinacloudsites.cn 的脱机终结点A Traffic Manager performance profile with an online endpoint called contoso-1.chinacloudsites.cn and an offline endpoint called contoso-dr.chinacloudsites.cn

若要回滚升级,必须使用应用程序的完全同步副本创建过渡环境。To make it possible to roll back the upgrade, you must create a staging environment with a fully synchronized copy of the application. 需要确保在升级过程中发生灾难性故障时应用程序可以快速恢复,另外过渡环境也必须是异地冗余的。Because you need to ensure that the application can quickly recover in case a catastrophic failure occurs during the upgrade process, the staging environment must be geo-redundant also. 需要执行以下步骤来创建适用于升级的过渡环境:The following steps are required to create a staging environment for the upgrade:

  1. 在主要区域中部署 Web 应用的过渡环境 (6)。Deploy a staging environment of the web app in the primary region (6).
  2. 在主 Azure 区域中创建辅助数据库 (7)。Create a secondary database in the primary Azure region (7). 配置 Web 应用的过渡环境以便与它建立连接。Configure the staging environment of the web app to connect to it.
  3. 通过在主要区域中复制辅助数据库,在备份区域中创建另一异地冗余的辅助数据库。Create another geo-redundant, secondary database in the backup region by replicating the secondary database in the primary region. (此方法称为“链接的异地复制”)(8)。(This method is called chained geo-replication.) (8).
  4. 在备份区域中部署 Web 应用实例的过渡环境 (9),并将其配置为连接在 (8) 中创建的异地冗余辅助数据库。Deploy a staging environment of the web app instance in the backup region (9) and configure it to connect the geo-redundant secondary database created at (8).

备注

这些准备步骤不会影响生产环境中的应用程序。These preparation steps won't impact the application in the production environment. 该应用程序将在读写模式下完全正常运行。It will remain fully functional in read-write mode.

可实现云灾难恢复的 SQL 数据库异地复制配置。

完成准备步骤后,可以升级过渡环境。When the preparation steps are complete, the staging environment is ready for the upgrade. 下图演示了这些升级步骤:The next diagram illustrates these upgrade steps:

  1. 将生产环境中的主数据库设置为只读模式 (10)。Set the primary database in the production environment to read-only mode (10). 此模式保证在升级过程中生产数据库 (V1) 不会更改,避免 V1 和 V2 数据库实例之间出现数据分歧。This mode guarantees that the production database (V1) won't change during the upgrade, thus preventing the data divergence between the V1 and V2 database instances.
-- Set the production database to read-only mode
ALTER DATABASE <Prod_DB>
SET (ALLOW_CONNECTIONS = NO)
  1. 通过断开辅助数据库的连接来终止异地复制 (11)。Terminate geo-replication by disconnecting the secondary (11). 此操作创建生产数据库的独立但完全同步的副本。This action creates an independent but fully synchronized copy of the production database. 将升级该数据库。This database will be upgraded. 以下示例使用了 Transact-SQL,但是也可以使用 PowerShellThe following example uses Transact-SQL but PowerShell is also available.
-- Disconnect the secondary, terminating geo-replication
ALTER DATABASE <Prod_DB>
REMOVE SECONDARY ON SERVER <Partner-Server>
  1. 针对 contoso-1-staging.chinacloudsites.cncontoso-dr-staging.chinacloudsites.cn 和过渡主数据库运行升级脚本 (12)。Run the upgrade script against contoso-1-staging.chinacloudsites.cn, contoso-dr-staging.chinacloudsites.cn, and the staging primary database (12). 数据库更改会自动复制到过渡辅助数据库。The database changes will be replicated automatically to the staging secondary.

可实现云灾难恢复的 SQL 数据库异地复制配置。

如果升级成功完成,则现在可将用户切换到应用程序的 V2 版本。If the upgrade finishes successfully, you're now ready to switch users to the V2 version of the application. 下图演示了涉及的步骤:The next diagram illustrates the steps involved:

  1. 激活在主要区域 (13) 和备份区域 (14) 的 Web 应用的生产环境和过渡环境之间进行的交换操作。Activate a swap operation between production and staging environments of the web app in the primary region (13) and in the backup region (14). 应用程序的 V2 现在变为生产环境,在备份区域中有冗余副本。V2 of the application now becomes a production environment, with a redundant copy in the backup region.
  2. 如果不再需要 V1 应用程序(15 和 16),则可以解除过渡环境。If you no longer need the V1 application (15 and 16), you can decommission the staging environment.

可实现云灾难恢复的 SQL 数据库异地复制配置。

如果升级过程不成功(例如,由于升级脚本中出现错误),可认为过渡环境处于不一致状态。If the upgrade process is unsuccessful (for example, due to an error in the upgrade script), consider the staging environment to be in an inconsistent state. 若要将应用程序回滚到升级前的状态,请在生产环境中重新使用 V1 应用程序。To roll back the application to the pre-upgrade state, revert to using V1 of the application in the production environment. 所需步骤如下图所示:The required steps are shown on the next diagram:

  1. 将生产环境中的主数据库副本设置为读写模式 (17)。Set the primary database copy in the production environment to read-write mode (17). 此操作还原生产环境中的完整 V1 功能。This action restores full V1 functionality in the production environment.
  2. 执行根本原因分析并修复或删除过渡环境(18 和 19)。Perform the root-cause analysis and repair or remove the staging environment (18 and 19).

此时应用程序可完全正常运行,你可以重复上述升级步骤。At this point, the application is fully functional, and you can repeat the upgrade steps.

备注

回滚操作不需要 DNS 更改,因为尚未执行交换操作。The rollback doesn't require DNS changes because you didn't perform a swap operation.

可实现云灾难恢复的 SQL 数据库异地复制配置。

此升级方法的主要优点是可以同时升级应用程序及其异地冗余副本,并且不会在升级过程中破坏业务连续性。The key advantage of this option is that you can upgrade both the application and its geo-redundant copy in parallel without compromising your business continuity during the upgrade.

此方法的主要缺点是它需要每个应用程序组件的双倍冗余,因此会导致更高的成本。The main tradeoff is that it requires double redundancy of each application component and therefore incurs higher dollar cost. 它还涉及更复杂的工作流。It also involves a more complicated workflow.

摘要Summary

本文中所述的两种升级方法具有不同的复杂性和成本,但它们都注重于最小化用户仅限于执行只读操作的时间。The two upgrade methods described in the article differ in complexity and dollar cost, but they both focus on minimizing how long the user is limited to read-only operations. 该时间由升级脚本的持续时间直接定义。That time is directly defined by the duration of the upgrade script. 该时间不依赖于数据库大小、所选的服务层级、网站配置或你无法轻松控制的其他因素。It doesn't depend on the database size, the service tier you chose, the website configuration, or other factors that you can't easily control. 所有准备步骤都从升级步骤中分离出来,不影响生产应用程序。All preparation steps are decoupled from the upgrade steps and don't impact the production application. 升级脚本的效率是决定升级期间的用户体验的关键因素。The efficiency of the upgrade script is a key factor that determines the user experience during upgrades. 因此,改进体验的最佳做法是将工作重心放在尽可能提高升级脚本的效率上。So, the best way to improve that experience is to focus your efforts on making the upgrade script as efficient as possible.

后续步骤Next steps