关于恢复计划About recovery plans

本文概述了 Azure Site Recovery 中的恢复计划。This article provides an overview of recovery plans in Azure Site Recovery.

恢复计划将计算机收集到恢复组中,以便进行故障转移。A recovery plan gathers machines into recovery groups for the purpose of failover. 恢复计划通过创建可以进行故障转移的较小独立单元来帮助你定义系统性的恢复流程。A recovery plan helps you to define a systematic recovery process, by creating small independent units that you can fail over. 一个单元通常表示你的环境中的一个应用。A unit typically represents an app in your environment.

  • 恢复计划定义计算机如何进行故障转移,以及它们在故障转移后按什么顺序启动。A recovery plan defines how machines fail over, and the sequence in which they start after failover.
  • 恢复计划可用于故障转移到 Azure 以及从 Azure 进行故障回复。Recovery plans can be used for both failover to and failback from Azure.
  • 最多可将 100 个受保护的实例添加到一个恢复计划。Up to 100 protected instances can be added to one recovery plan.
  • 可以通过向计划中添加顺序、说明和任务来自定义计划。You can customize a plan by adding order, instructions, and tasks to it.
  • 在定义计划后,可以根据它运行故障转移。After a plan is defined, you can run a failover on it.
  • 可以在多个恢复计划中引用计算机,如果先前已使用其他恢复计划部署了计划,则后续计划将跳过计算机的部署/启动。Machines can be referenced in multiple recovery plans, in which subsequent plans skip the deployment/startup of a machine if it was previously deployed using another recovery plan.

为何使用恢复计划?Why use a recovery plan?

使用恢复计划可以执行以下操作:Use recovery plans to:

  • 基于应用的依赖项对应用进行建模。Model an app around its dependencies.
  • 自动执行恢复任务以减少恢复时间目标 (RTO)。Automate recovery tasks to reduce recovery time objective (RTO).
  • 通过确保你的应用是恢复计划的一部分来验证你已做好了迁移或灾难恢复准备工作。Verify that you're prepared for migration or disaster recovery by ensuring that your apps are part of a recovery plan.
  • 按恢复计划运行测试故障转移,确保灾难恢复或迁移按预期工作。Run test failovers on recovery plans, to ensure disaster recovery or migration is working as expected.

对应用进行建模Model apps

可以规划并创建一个恢复组来捕获特定于应用的属性。You can plan and create a recovery group to capture app-specific properties. 例如,让我们考虑一个典型的三层应用程序,该应用程序具有 SQL Server 后端、中间件和 Web 前端。As an example, let's consider a typical three-tier application with a SQL server backend, middleware, and a web frontend. 通常,你将自定义恢复计划,以便使每层中的计算机在故障转移后按正确顺序启动。Typically, you customize the recovery plan so that machines in each tier start in the correct order after failover.

  • SQL 后端应首先启动,接下来是中间件,最后是 Web 前端。The SQL backend should start first, the middleware next, and finally the web frontend.
  • 此启动顺序可以确保应用在最后的计算机启动之前一直保持工作。This start order ensures that the app is working by the time the last machine starts.
  • 此顺序可以确保当中间件启动并尝试连接到 SQL Server 层时,SQL Server 层已在运行。This order ensures that when the middleware starts and tries to connect to the SQL Server tier, the SQL Server tier is already running.
  • 此顺序还可帮助确保前端服务器最后启动,从而确保在所有组件已启动并运行并且应用已准备好接受请求之前,最终用户不会连接到应用 URL。This order also helps ensure that the front-end server starts last, so that end users don't connect to the app URL before all the components are up and running, and the app is ready to accept requests.

若要创建此顺序,请向恢复组中添加组,然后向组中添加计算机。To create this order, you add groups to the recovery group, and add machines into the groups.

  • 如果指定了顺序,则会使用序列。Where order is specified, sequencing is used. 为了改进应用程序恢复 RTO,操作会根据情况并行运行。Actions run in parallel as appropriate, to improve application recovery RTO.

  • 单个组中的计算机将并行进行故障转移。Machines in a single group fail over in parallel.

  • 不同组中的计算机将按组顺序进行故障转移,因此,只有当组 1 中的计算机已进行故障转移并启动后,组 2 中的计算机才会启动。Machines in different groups fail over in group order, so that Group 2 machines start their failover only after all the machines in Group 1 have failed over and started.

    示例恢复计划

在此自定义就位后,按恢复计划运行故障转移时会发生以下事情:With this customization in place, here's what happens when you run a failover on the recovery plan:

  1. 一个关闭步骤尝试关闭本地计算机。A shutdown step attempts to turn off the on-premises machines. 运行测试故障转移时例外,在这种情况下,主站点会继续运行。The exception is if you run a test failover, in which case the primary site continues to run.
  2. 关闭步骤会触发恢复计划中所有计算机的并行故障转移。The shutdown triggers a parallel failover of all the machines in the recovery plan.
  3. 故障转移使用复制的数据准备虚拟机磁盘。The failover prepares virtual machine disks using replicated data.
  4. 启动组按顺序运行,并启动每个组中的计算机。The startup groups run in order, and start the machines in each group. 首先,组 1 运行,然后是组 2,最后是组 3。First, Group 1 runs, then Group 2, and finally, Group 3. 如果任何组中有一台以上计算机,则所有计算机将并行启动。If there's more than one machine in any group, then all the machines start in parallel.

在恢复计划中自动执行任务Automate tasks in recovery plans

恢复大型应用程序可能是一项复杂的任务。Recovering large applications can be a complex task. 手动步骤会使流程容易出错,并且运行故障转移的人可能不了解所有的应用复杂情况。Manual steps make the process prone to error, and the person running the failover might not be aware of all app intricacies. 可以通过用于故障转移到 Azure 的 Azure 自动化 runbook 或通过脚本使用恢复计划来维持秩序,自动执行每个步骤所需的操作。You can use a recovery plan to impose order, and automate the actions needed at each step, using Azure Automation runbooks for failover to Azure, or scripts. 对于无法自动执行的任务,可以在恢复计划中插入暂停以便手动执行操作。For tasks that can't be automated, you can insert pauses for manual actions into recovery plans. 可以配置许多类型的任务:There are a couple of types of tasks you can configure:

  • 故障转移后 Azure VM 上的任务:故障转移到 Azure 时,通常需要执行相关操作,以便可以在故障转移后连接到 VM。Tasks on the Azure VM after failover: When you're failing over to Azure, you typically need to perform actions so that you can connect to the VM after failover. 例如:For example:
    • 在 Azure VM 上创建一个公共 IP 地址。Create a public IP address on the Azure VM.
    • 将一个网络安全组分配给 Azure VM 的网络适配器。Assign a network security group to the network adapter of the Azure VM.
    • 将负载均衡器添加到可用性集。Add a load balancer to an availability set.
  • 故障转移后 VM 内的任务:这些任务通常重新配置在计算机上运行的应用,让应用程序能够在新的环境中继续正常运行。Tasks inside VM after failover: These tasks typically reconfigure the app running on the machine, so that it continues to work correctly in the new environment. 例如:For example:
    • 在计算机内修改数据库连接字符串。Modify the database connection string inside the machine.
    • 更改 Web 服务器配置或规则。Change the web server configuration or rules.

对恢复计划运行测试故障转移Run a test failover on recovery plans

可以使用恢复计划来触发测试故障转移。You can use a recovery plan to trigger a test failover. 请使用以下最佳实践:Use the following best practices:

  • 在运行完整故障转移之前,始终对应用运行测试故障转移。Always complete a test failover on an app, before running a full failover. 测试故障转移可帮助你检查应用在恢复站点上是否正常运行。Test failovers help you to check whether the app comes up on the recovery site.

  • 如果你发现遗漏了某些内容,请触发清理,然后重新运行测试故障转移。If you find you've missed something, trigger a clean-up, and then rerun the test failover.

  • 多次运行测试故障转移,直到确定应用可顺利恢复。Run a test failover multiple times, until you're sure that the app recovers smoothly.

  • 因为每个应用都是唯一的,因此你需要为每个应用构建自定义恢复计划,然后对每个应用运行测试故障转移。Because each app is unique, you need to build recovery plans that are customized for each application, and run a test failover on each.

  • 应用及其依赖项会经常更改。Apps and their dependencies change frequently. 若要确保恢复计划是最新的,请每个季度为每个应用运行一次测试故障转移。To ensure recovery plans are up to date, run a test failover for each app every quarter.

    Site Recovery 中测试恢复计划示例的屏幕截图

后续步骤Next steps