VMware 到 Azure 的灾难恢复体系结构VMware to Azure disaster recovery architecture

本文介绍使用 Azure Site Recovery 服务在本地 VMware 站点和 Azure 之间部署 VMware 虚拟机 (VM) 的灾难恢复复制、故障转移和恢复时使用的体系结构和过程。This article describes the architecture and processes used when you deploy disaster recovery replication, failover, and recovery of VMware virtual machines (VMs) between an on-premises VMware site and Azure using the Azure Site Recovery service.

体系结构组件Architectural components

下面的表和图提供了用于将 VMware 灾难恢复到 Azure 的组件的概要视图。The following table and graphic provide a high-level view of the components used for VMware disaster recovery to Azure.

组件Component 要求Requirement 详细信息Details
AzureAzure Azure 订阅、用于缓存的 Azure 存储帐户、托管磁盘和 Azure 网络。An Azure subscription, Azure Storage account for cache, Managed Disk and Azure network. 从本地 VM 复制的数据存储在 Azure 存储中。Replicated data from on-premises VMs is stored in Azure storage. 运行从本地到 Azure 的故障转移时,将使用复制的数据创建 Azure VM。Azure VMs are created with the replicated data when you run a failover from on-premises to Azure. 创建 Azure VM 后,它们将连接到 Azure 虚拟网络。The Azure VMs connect to the Azure virtual network when they're created.
配置服务器计算机Configuration server machine 单个本地计算机。A single on-premises machine. 建议将其作为可通过下载的 OVF 模板部署的 VMware VM 来运行。We recommend that you run it as a VMware VM that can be deployed from a downloaded OVF template.

计算机运行所有本地 Site Recovery 组件,包括配置服务器、进程服务器和主目标服务器。The machine runs all on-premises Site Recovery components, which include the configuration server, process server, and master target server.
配置服务器:在本地和 Azure 之间协调通信并管理数据复制。Configuration server: Coordinates communications between on-premises and Azure, and manages data replication.

进程服务器:默认安装在配置服务器上。Process server: Installed by default on the configuration server. 它接收复制数据,通过缓存、压缩和加密对其进行优化,然后将数据发送到 Azure 存储。It receives replication data; optimizes it with caching, compression, and encryption; and sends it to Azure Storage. 进程服务器还会将 Azure Site Recovery 移动服务安装在要复制的 VM 上,并在本地计算机上执行自动发现。The process server also installs Azure Site Recovery Mobility Service on VMs you want to replicate, and performs automatic discovery of on-premises machines. 随着部署扩大,可以另外添加单独的进程服务器来处理更大的复制流量。As your deployment grows, you can add additional, separate process servers to handle larger volumes of replication traffic.

主目标服务器:默认安装在配置服务器上。Master target server: Installed by default on the configuration server. 它处理从 Azure 进行故障回复期间产生的复制数据。It handles replication data during failback from Azure. 对于大型部署,可以另外添加一个单独的主目标服务器用于故障回复。For large deployments, you can add an additional, separate master target server for failback.
VMware 服务器VMware servers VMware VM 在本地 vSphere ESXi 服务器上托管。VMware VMs are hosted on on-premises vSphere ESXi servers. 我们建议使用 vCenter 服务器管理主机。We recommend a vCenter server to manage the hosts. 在 Site Recovery 部署期间,将 VMware 服务器添加到恢复服务保管库。During Site Recovery deployment, you add VMware servers to the Recovery Services vault.
复制的计算机Replicated machines 移动服务将安装在复制的每个 VMware VM 上。Mobility Service is installed on each VMware VM that you replicate. 建议允许从进程服务器自动安装。We recommend that you allow automatic installation from the process server. 另外,也可以手动安装此服务,或者使用诸如 System Center Configuration Manager 的自动部署方法。Alternatively, you can install the service manually or use an automated deployment method, such as System Center Configuration Manager.

VMware 到 Azure 体系结构VMware to Azure architecture

组件

复制过程Replication process

  1. 为某台 VM 启用复制时,将使用指定的复制策略开始到 Azure 存储的初始复制。When you enable replication for a VM, initial replication to Azure storage begins, using the specified replication policy. 注意以下事项:Note the following:

    • 对于 VMware VM,复制是在块级别进行的,几乎连续进行,使用的是在 VM 上运行的移动服务代理。For VMware VMs, replication is block-level, near-continuous, using the Mobility service agent running on the VM.
    • 应用的任何复制策略设置:Any replication policy settings are applied:
      • RPO 阈值RPO threshold. 此设置不影响复制。This setting does not affect replication. 它有助于监视。It helps with monitoring. 如果当前 RPO 超出了你指定的阈值限制,则会引发事件,并且还可能会发送电子邮件。An event is raised, and optionally an email sent, if the current RPO exceeds the threshold limit that you specify.
      • 恢复点保留期Recovery point retention. 此设置指定当发生中断时要回退的时间距离。This setting specifies how far back in time you want to go when a disruption occurs. 高级存储上的最大保留期为 24 小时。Maximum retention on premium storage is 24 hours. 标准存储上的最大保留期为 72 小时。On standard storage it's 72 hours.
      • 应用一致的快照App-consistent snapshots. 应用一致的快照可以每隔 1 到 12 个小时创建一次,具体取决于你的应用需求。App-consistent snapshot can be take every 1 to 12 hours, depending on your app needs. 快照是标准的 Azure blob 快照。Snapshots are standard Azure blob snapshots. 在 VM 上运行的移动代理根据此设置请求 VSS 快照,并且会将该时间点标记为复制流中的一个应用一致点。The Mobility agent running on a VM requests a VSS snapshot in accordance with this setting, and bookmarks that point-in-time as an application consistent point in the replication stream.
  2. 流量通过 Internet 复制到 Azure 存储公共终结点。Traffic replicates to Azure storage public endpoints over the internet. 或者,可以结合使用 Azure ExpressRoute 和公共对等互连Alternately, you can use Azure ExpressRoute with public peering. 不支持通过站点到站点虚拟专用网络 (VPN) 将流量从本地站点复制到 Azure。Replicating traffic over a site-to-site virtual private network (VPN) from an on-premises site to Azure isn't supported.

  3. 完成初始复制后,开始将增量更改复制到 Azure。After initial replication finishes, replication of delta changes to Azure begins. 针对机器跟踪的更改将发送到进程服务器。Tracked changes for a machine are sent to the process server.

  4. 通信按如下方式发生:Communication happens as follows:

    • VM 通过 HTTPS 443 入站端口与本地配置服务器通信,进行复制管理。VMs communicate with the on-premises configuration server on port HTTPS 443 inbound, for replication management.
    • 配置服务器通过 HTTPS 443 出站端口来与 Azure 协调复制。The configuration server orchestrates replication with Azure over port HTTPS 443 outbound.
    • VM 将复制数据发送到 HTTPS 9443 入站端口上的进程服务器(在配置服务器计算机上运行)。VMs send replication data to the process server (running on the configuration server machine) on port HTTPS 9443 inbound. 可以修改此端口。This port can be modified.
    • 进程服务器接收复制数据、优化和加密数据,然后通过 443 出站端口将其发送到 Azure 存储。The process server receives replication data, optimizes and encrypts it, and sends it to Azure storage over port 443 outbound.
  5. 复制数据首先登陆 Azure 中的缓存存储帐户。The replication data logs first land in a cache storage account in Azure. 处理这些日志,并将数据存储在 Azure 托管磁盘(称为 asr 种子磁盘)中。These logs are processed and the data is stored in an Azure Managed Disk (called as asr seed disk). 将在此磁盘上创建恢复点。The recovery points are created on this disk.

VMware 到 Azure 的复制过程VMware to Azure replication process

复制过程

故障转移和故障回复过程Failover and failback process

在设置复制并运行故障恢复演练(测试故障转移)来检查是否一切都按预期工作后,可以根据需要运行故障转移和故障回复。After replication is set up and you run a disaster recovery drill (test failover) to check that everything's working as expected, you can run failover and failback as you need to.

  1. 可以针对一台计算机运行故障转移,或创建恢复计划来同时故障转移多个 VM。You run fail for a single machine, or create a recovery plans to fail over multiple VMs at the same time. 相比单计算机故障转移,恢复计划的优势包括:The advantage of a recovery plan rather than single machine failover include:

    • 可以通过在单个恢复计划中包含整个应用的所有 VM,为应用依赖关系建模。You can model app-dependencies by including all the VMs across the app in a single recovery plan.
    • 可以添加脚本、Azure Runbook 和暂停手动操作。You can add scripts, Azure runbooks, and pause for manual actions.
  2. 触发初始故障转移之后,可提交它来开始访问 Azure VM 中的工作负荷。After triggering the initial failover, you commit it to start accessing the workload from the Azure VM.

  3. 当本地主站点再次可用时,可以准备故障回复。When your primary on-premises site is available again, you can prepare for fail back. 若要故障回复,需要设置故障回复基础结构,包括:In order to fail back, you need to set up a failback infrastructure, including:

    • Azure 中的临时进程服务器:若要从 Azure 进行故障回复,需要设置用作进程服务器的 Azure VM,以处理从 Azure 进行的复制。Temporary process server in Azure: To fail back from Azure, you set up an Azure VM to act as a process server to handle replication from Azure. 故障回复完成后,可以删除此 VM。You can delete this VM after failback finishes.
    • VPN 连接:若要进行故障回复,需要设置从 Azure 网络到本地站点的 VPN 连接(或 ExpressRoute)。VPN connection: To fail back, you need a VPN connection (or ExpressRoute) from the Azure network to the on-premises site.
    • 单独的主目标服务器:默认情况下,在本地 VMware VM 上与配置服务器一起安装的主目标服务器用于处理故障回复。Separate master target server: By default, the master target server that was installed with the configuration server on the on-premises VMware VM handles failback. 如果对大量流量进行故障回复,应创建专用于此目的的独立本地主目标服务器。If you need to fail back large volumes of traffic, set up a separate on-premises master target server for this purpose.
    • 故障回复策略:若要复制回到本地站点,需要创建故障回复策略。Failback policy: To replicate back to your on-premises site, you need a failback policy. 此策略是在创建从本地到 Azure 的复制策略时自动创建的。This policy is automatically created when you create a replication policy from on-premises to Azure.
  4. 所有组件均就位后,故障回复通过三个操作进行:After the components are in place, failback occurs in three actions:

    • 第 1 阶段:重新保护 Azure VM,以便它们可以从 Azure 复制回本地 VMware VM。Stage 1: Reprotect the Azure VMs so that they replicate from Azure back to the on-premises VMware VMs.
    • 第 2 阶段:运行到本地站点的故障转移。Stage 2: Run a failover to the on-premises site.
    • 第 3 阶段:在工作负荷进行故障回复后,为本地 VM 重新启用复制。Stage 3: After workloads have failed back, you reenable replication for the on-premises VMs.

从 Azure 进行 VMware 故障回复VMware failback from Azure

故障回复

后续步骤Next steps

根据此教程启用 VMware 到 Azure 复制。Follow this tutorial to enable VMware to Azure replication.