对复制到 Azure 的物理服务器进行故障转移和故障回复Fail over and fail back physical servers replicated to Azure

本教程描述了如何使用 Azure Site Recovery 对复制到 Azure 的本地物理服务器进行故障转移。This tutorial describes how to fail over on-premises physical servers that are replicating to Azure with Azure Site Recovery. 故障转移后,可以从 Azure 故障回复到本地站点(如果可用)。After you've failed over, you fail back from Azure to your on-premises site when it's available.

开始之前Before you start

  • 了解灾难恢复中的故障转移过程。Learn about the failover process in disaster recovery.
  • 如果要对多台计算机进行故障转移,请了解如何在恢复计划中聚集计算机。If you want to fail over multiple machines, learn how to gather machines together in a recovery plan.
  • 在执行完全故障转移之前,请运行灾难恢复演练,确保一切按预期运行。Before you do a full failover, run a disaster recovery drill to ensure that everything is working as expected.
  • 按照这些说明,准备在故障转移后连接到 Azure VM。Follow these instructions to prepare to connect to Azure VMs after failover.

运行故障转移Run a failover

验证服务器属性Verify server properties

验证服务器属性,确保其符合 Azure VM 的 Azure 要求Verify the server properties, and make sure that it complies with Azure requirements for Azure VMs.

  1. 在“受保护的项”中,单击“复制的项”,然后选择计算机 。In Protected Items , click Replicated Items , and select the machine.

  2. “复制的项”窗格中具有计算机信息、运行状况状态和最新可用恢复点的摘要 。In the Replicated item pane, there's a summary of machine information, health status, and the latest available recovery points. 单击“属性” ,查看详细信息。Click Properties to view more details.

  3. 在“计算和网络” 中,可以修改 Azure 名称、资源组、目标大小、可用性集和托管磁盘设置In Compute and Network , you can modify the Azure name, resource group, target size, availability set, and managed disk settings

  4. 可查看和修改网络设置,包括在运行故障转移后 Azure VM 所在的网络/子网,以及将分配给它的 IP 地址。You can view and modify network settings, including the network/subnet in which the Azure VM will be located after failover, and the IP address that will be assigned to it.

  5. 在“磁盘” 中,可以看到有关计算机操作系统和数据磁盘的信息。In Disks , you can see information about the machine operating system and data disks.

故障转移到 AzureFail over to Azure

  1. 在“受保护的项” > “复制的项” 中,单击计算机 >“故障转移” 。In Protected items > Replicated items click the machine > Failover .

  2. 在“故障转移”中,选择要故障转移到的“恢复点” 。In Failover select a Recovery Point to fail over to. 可以使用以下选项之一:You can use one of the following options:

    • 最新 :此选项会首先处理发送到 Site Recovery 的所有数据。Latest : This option first processes all the data sent to Site Recovery. 它提供最低的 RPO(恢复点对象),因为故障转移后创建的 Azure VM 具有触发故障转移时复制到 Site Recovery 的所有数据。It provides the lowest RPO (Recovery Point Objective) because the Azure VM created after failover has all the data that was replicated to Site Recovery when the failover was triggered.
    • 最新处理 :此选项将计算机故障转移到由 Site Recovery 处理的最新恢复点。Latest processed : This option fails over the machine to the latest recovery point processed by Site Recovery. 此选项提供低 RTO(恢复时间目标),因为无需费时处理未经处理的数据。This option provides a low RTO (Recovery Time Objective), because no time is spent processing unprocessed data.
    • 最新应用一致 :此选项会将计算机故障转移到由 Site Recovery 处理的最新应用一致性恢复点。Latest app-consistent : This option fails over the machine to the latest app-consistent recovery point processed by Site Recovery.
    • 自定义 :指定一个恢复点。Custom : Specify a recovery point.
  3. 如果希望 Site Recovery 在触发故障转移之前尝试关闭源计算机,请选择“在开始故障转移前关闭计算机” 。Select Shut down machine before beginning failover if you want Site Recovery to try to shut down source machine before triggering the failover. 即使关机失败,故障转移也仍会继续。Failover continues even if shutdown fails. 可以在“作业” 页上跟踪故障转移进度。You can follow the failover progress on the Jobs page.

  4. 如果已准备好连接到 Azure VM,请进行连接,以在故障转移后对其进行验证。If you prepared to connect to the Azure VM, connect to validate it after the failover.

  5. 验证后,提交故障转移 。After you verify, Commit the failover. 这会删除所有可用的恢复点。This deletes all the available recovery points.

警告

请勿取消正在进行的故障转移。Don't cancel a failover in progress. 在故障转移开始之前,会停止计算机复制。Before failover begins, machine replication stops. 如果取消故障转移,它会停止,但计算机不会再次复制。If you cancel the failover, it stops, but the machine won't replicate again. 对于物理服务器,其他故障转移处理可能需要大约八到十分钟时间才能完成。For physical servers, additional failover processing can take around eight to ten minutes to complete.

在故障转移过程中自动执行操作Automate actions during failover

你可能希望在故障转移过程中自动执行操作。You might want to automate actions during failover. 为此,可以在恢复计划中使用脚本或 Azure 自动化 runbook。To do this, you can use scripts or Azure automation runbooks in recovery plans.

  • 了解如何创建和自定义恢复计划,包括添加脚本。Learn about creating and customizing recovery plans, including adding scripts.

故障转移后配置设置Configure settings after failover

故障转移后,需要配置 Azure 设置以连接到已复制的 Azure VM。After failover you need to configure Azure settings to connect to the replicated Azure VMs. 另外,还需设置内部和公共 IP 地址。In addition, set up internal and public IP addressing.

为重新保护和故障回复做好准备Prepare for reprotection and failback

故障转移到 Azure 后,可以通过将 Azure VM 复制到本地站点来重新保护它们。After failing over to Azure, you reprotect Azure VMs by replicating them to the on-premises site. 然后,在复制后,可通过运行从 Azure 到本地站点的故障转移,将它们故障转移回本地站点。Then after they're replicating, you can fail them back to on-premises, by running a failover from Azure to your on-premises site.

  1. 使用 Site Recovery 复制到 Azure 的物理服务器只能作为 VMware VM 故障回复。Physical servers replicated to Azure using Site Recovery can only fail back as VMware VMs. 必须有 VMware 基础结构,才能进行故障回复。You need a VMware infrastructure in order to fail back. 按照本文的步骤,为重新保护和故障回复做好准备,包括在 Azure 中设置进程服务器,以及在本地主目标服务器中设置站点到站点 VPN 或 ExpressRoute 专用对等互连,以进行故障回复。Follow the steps in this article to prepare for reprotection and failback, including setting up a process server in Azure, and an on-premises master target server, and configuring a site-to-site VPN, or ExpressRoute private peering, for failback.
  2. 请确保本地配置服务器正在运行并已连接到 Azure。Make sure that the the on-premises configuration server is running and connected to Azure. 在故障转移到 Azure 的过程中,本地站点可能无法访问,因此配置服务器可能不可用或关闭。During failover to Azure, the on-premises site might not be accessible, and the configuration server might be unavailable or shut down. 故障回复期间,VM 必须位于配置服务器数据库中。During failback, the VM must exist in the configuration server database. 否则,故障回复不会成功。Otherwise, failback is unsuccessful.
  3. 删除本地主目标服务器上的所有快照。Delete any snapshots on the on-premises master target server. 如果存在快照,则重新保护将不起作用。Reprotection won't work if there are snapshots. 在执行重新保护作业期间,VM 上的快照会自动合并。The snapshots on the VM are automatically merged during a reprotect job.
  4. 如果你正在重新保护为实现多虚拟机一致性而集中到一个复制组中的虚拟机,请确保它们都具有相同的操作系统(Windows 或 Linux),并确保部署的主目标服务器具有相同类型的操作系统。If you're reprotecting VMs gathered into a replication group for multi-VM consistency, make sure they all have the same operating system (Windows or Linux) and make sure that the master target server you deploy has the same type of operating system. 复制组中的所有 VM 都必须使用相同的主目标服务器。All VMs in a replication group must use the same master target server.
  5. 打开故障回复所需的端口以进行故障回复。Open the required ports for failback.
  6. 确保在故障回复之前已连接 vCenter Server。Ensure that the vCenter Server is connected before failback. 否则,断开磁盘连接并将其附加回到虚拟机的操作会失败。Otherwise, disconnecting disks and attaching them back to the virtual machine fails.
  7. 如果使用 vCenter Server 管理要对其进行故障回复的 VM,请确保你具有所需的权限。If a vCenter server manages the VMs to which you'll fail back, make sure that you have the required permissions. 如果执行只读的用户 vCenter 发现并保护虚拟机,保护会成功且故障转移可正常工作。If you perform a read-only user vCenter discovery and protect virtual machines, protection succeeds, and failover works. 但是,在重新保护期间,故障转移会失败,因为无法发现数据存储,并且在重新保护期间没有列出它们。However, during reprotection, failover fails because the datastores can't be discovered, and aren't listed during reprotection. 若要解决此问题,可以使用相应的帐户/权限更新 vCenter 凭据,然后重试作业。To resolve this problem, you can update the vCenter credentials with an appropriate account/permissions, and then retry the job.
  8. 如果使用模板创建虚拟机,请确保每个 VM 都有自己的磁盘 UUID。If you used a template to create your virtual machines, ensure that each VM has its own UUID for the disks. 如果本地 VM 的 UUID 与主目标服务器的 UUID 冲突(因为两者都是基于同一模板创建的),重新保护会失败。If the on-premises VM UUID clashes with the UUID of the master target server because both were created from the same template, reprotection fails. 从其他模板部署。Deploy from a different template.
  9. 如果要故障回复到备用 vCenter Server,请确保已发现新的 vCenter Server 和主目标服务器。If you're failing back to an alternate vCenter Server, make sure that the new vCenter Server and the master target server are discovered. 通常情况下,如果这些数据存储不可访问,或在“重新保护”中不可见 。Typically if they're not the datastores aren't accessible, or aren't visible in Reprotect .
  10. 请验证无法进行故障回复的下列情况:Verify the following scenarios in which you can't fail back:
    • 是否使用的是 ESXi 5.5 免费版或 vSphere 6 虚拟机监控程序免费版。If you're using either the ESXi 5.5 free edition or the vSphere 6 Hypervisor free edition. 升级到其他版本。Upgrade to a different version.
    • 你是否有 Windows Server 2008 R2 SP1 物理服务器。If you have a Windows Server 2008 R2 SP1 physical server.
    • 已迁移的 VM。VMs that have been migrated.
    • 已移动到另一个资源组的 VM。A VM that's been moved to another resource group.
    • 已删除的副本 Azure VM。A replica Azure VM that's been deleted.
    • 未受保护(复制到本地站点)的副本 Azure VM。A replica Azure VM that isn't protected (replicating to the on-premises site).
  11. 查看可以使用的故障回复的类型 - 原始位置恢复和备用位置恢复。Review the types of failback you can use - original location recovery and alternate location recovery.

将 Azure VM 重新保护到备用位置Reprotect Azure VMs to an alternate location

此过程假定本地 VM 不可用。This procedure presumes that the on-premises VM isn't available.

  1. 在保管库中,单击“受保护的项” > “复制的项”,右键单击已故障转移的计算机,然后单击“重新保护” 。In the vault > Protected items > Replicated items , right-click the machine that was failed over > Re-Protect .

  2. 在“重新保护”中,确保选择“Azure 到本地” 。In Re-protect , verify that Azure to On-premises , is selected.

  3. 指定本地主目标服务器和进程服务器。Specify the on-premises master target server, and the process server.

  4. 在“数据存储”中,选择要将本地磁盘恢复到的主目标数据存储 。In Datastore , select the master target datastore to which you want to recover the disks on-premises.

    • 如果本地 VM 已被删除或不存在,而你需要创建新磁盘,请使用此选项。Use this option if the on-premises VM has been deleted or doesn't exist, and you need to create new disks.
    • 如果磁盘已存在,此设置会被忽略,但你仍需指定一个值。This setting is ignored if the disks already exists, but you do need to specify a value.
  5. 选择主目标保留驱动器。Select the master target retention drive. 将自动选择故障回复策略。The failback policy is automatically selected.

  6. 单击“确定”开始重新保护。 Click OK to begin reprotection. 一个作业会开始将 Azure VM 复制到本地站点。A job begins to replicate the Azure VM to the on-premises site. 可以在“ 作业 ”选项卡上跟踪进度。You can track the progress on the Jobs tab.

备注

如果要将 Azure VM 恢复到现有本地 VM,请使用读/写访问权限将本地虚拟机的数据存储装载在主目标服务器的 ESXi 主机上。If you want to recover the Azure VM to an existing on-premises VM, mount the on-premises virtual machine's datastore with read/write access, on the master target server's ESXi host.

从 Azure 进行故障回复Fail back from Azure

运行故障转移,如下所示:Run the failover as follows:

  1. 在“复制的项”页中右键单击该计算机,然后单击“非计划的故障转移” 。On the Replicated Items page, right-click the machine > Unplanned Failover .
  2. 在“确认故障转移”中,验证故障转移方向为从 Azure 转移 。In Confirm Failover , verify that the failover direction is from Azure. 3. 选择要用于此故障转移的恢复点。3.Select the recovery point that you want to use for the failover.
    • 建议使用“最新”恢复点 。We recommend that you use the Latest recovery point. 应用一致性点会在最新的时间点之后,并会导致丢失部分数据。The app-consistent point is behind the latest point in time, and causes some data loss.
    • “最新”是崩溃一致性恢复点 。Latest is a crash-consistent recovery point.
    • 故障转移运行时,Site Recovery 会关闭 Azure VM,并启动本地 VM。When failover runs, Site Recovery shuts down the Azure VMs, and boots up the on-premises VM. 这会导致出现停机时间,因此请选择适当的时间。There will be some downtime, so choose an appropriate time.
  3. 右键单击该计算机,然后单击“提交” 。Right-click the machine, and click Commit . 由此触发的作业会删除 Azure VM。This triggers a job that removes the Azure VMs.
  4. 验证 Azure VM 已按预期情况关闭。Verify that Azure VMs have been shut down as expected.

将本地计算机重新保护到 AzureReprotect on-premises machines to Azure

数据现应返回到本地站点,但不会复制到 Azure。Data should now be back on your on-premises site, but it isn't replicating to Azure. 可按如下操作开始再次复制到 Azure:You can start replicating to Azure again as follows:

  1. 在此保管库中,单击“受保护的项”>“复制的项”,选择已故障回复的 VM,然后单击“重新保护” 。In the vault > Protected Items >Replicated Items , select the failed back VMs that have failed back, and click Re-Protect .

  2. 选择用于将复制数据发送到 Azure 的进程服务器,然后单击“确定” 。Select the process server that is used to send the replicated data to Azure, and click OK .

后续步骤Next steps

重新保护作业完成后,本地 VM 将复制到 Azure。After the reprotect job finishes, the on-premises VM is replicating to Azure. 可根据需要再次运行到 Azure 的故障转移As needed, you can run another failover to Azure.