执行灾难恢复演练Performing Disaster Recovery Drill

建议定期对恢复工作流执行应用程序就绪性验证。It is recommended that validation of application readiness for recovery workflow is performed periodically. 验证应用程序的行为以及数据丢失和/或涉及到故障转移的中断所造成的影响,是一种良好的工程实践。Verifying the application behavior and implications of data loss and/or the disruption that failover involves is a good engineering practice. 许多行业标准在涉及到业务连续性认证方面也会提出此要求。It is also a requirement by most industry standards as part of business continuity certification.

执行灾难恢复演练的操作包括:Performing a disaster recovery drill consists of:

  • 模拟数据层中断Simulating data tier outage
  • 恢复Recovering
  • 验证恢复后的应用程序完整性Validate application integrity post recovery

根据针对业务连续性设计应用程序的方式,用于执行演练的工作流会有所不同。Depending on how you designed your application for business continuity, the workflow to execute the drill can vary. 本文介绍在 Azure SQL 数据库上下文中执行灾难恢复演练的最佳做法。This article describes the best practices for conducting a disaster recovery drill in the context of Azure SQL Database.

异地还原Geo-restore

若要防止执行灾难恢复演练时发生潜在的数据丢失,请通过创建生产环境的副本在测试环境中执行演练,并使用测试环境来验证应用程序的故障转移工作流。To prevent the potential data loss when conducting a disaster recovery drill, perform the drill using a test environment by creating a copy of the production environment and using it to verify the application's failover workflow.

中断模拟Outage simulation

若要模拟中断,可重命名源数据库。To simulate the outage, you can rename the source database. 此名称更改会导致应用程序连接失败。This name change causes application connectivity failures.

恢复Recovery

  • 根据此处所述,在另一台服务器中执行数据库异地还原。Perform the geo-restore of the database into a different server as described here.
  • 更改应用程序配置以连接到已恢复的数据库,并按照在恢复后配置数据库指南完成恢复。Change the application configuration to connect to the recovered database and follow the Configure a database after recovery guide to complete the recovery.

验证Validation

通过验证恢复后的应用程序完整性(包括连接字符串、登录名、基本功能测试,或标准应用程序验收过程的其他验证部分)来完成演练。Complete the drill by verifying the application integrity post recovery (including connection strings, logins, basic functionality testing, or other validations part of standard application signoffs procedures).

故障转移组Failover groups

对于使用故障转移组保护的数据库,演练过程包括按计划故障转移到辅助服务器。For a database that is protected using failover groups, the drill exercise involves planned failover to the secondary server. 计划的故障转移可确保在切换角色后故障转移组中的主数据库和辅助数据库保持同步。The planned failover ensures that the primary and the secondary databases in the failover group remain in sync when the roles are switched. 与非计划的故障转移不同,此操作不会导致数据丢失,因此可以在生产环境中执行演练。Unlike the unplanned failover, this operation does not result in data loss, so the drill can be performed in the production environment.

中断模拟Outage simulation

若要模拟中断,可以禁用已连接到数据库的 Web 应用程序或虚拟机。To simulate the outage, you can disable the web application or virtual machine connected to the database. 此中断模拟会导致 Web 客户端连接失败。This outage simulation results in the connectivity failures for the web clients.

恢复Recovery

  • 确保 DR 区域中的应用程序配置指向以前的辅助数据库,故障转移后,该数据库将成为完全可访问的新主数据库。Make sure the application configuration in the DR region points to the former secondary, which becomes the fully accessible new primary.
  • 启动故障转移组从辅助服务器进行的计划内故障转移Initiate planned failover of the failover group from the secondary server.
  • 按照在恢复后配置数据库指南完成恢复。Follow the Configure a database after recovery guide to complete the recovery.

验证Validation

通过验证恢复后的应用程序完整性(包括连接性、基本功能测试,或演练验收所需的其他验证)来完成演练。Complete the drill by verifying the application integrity post recovery (including connectivity, basic functionality testing, or other validations required for the drill signoffs).

后续步骤Next steps