Azure 到 Azure 的灾难恢复体系结构Azure to Azure disaster recovery architecture

本文介绍使用 Azure Site Recovery 服务为 Azure 虚拟机 (VM) 部署灾难恢复时所用的体系结构、组件和过程。This article describes the architecture, components, and processes used when you deploy disaster recovery for Azure virtual machines (VMs) using the Azure Site Recovery service. 使用灾难恢复设置,Azure VM 可以持续复制到不同的目标区域。With disaster recovery set up, Azure VMs continuously replicate from to a different target region. 如果发生服务中断,可将 VM 故障转移到次要区域,然后在次要区域中对其进行访问。If an outage occurs, you can fail over VMs to the secondary region, and access them from there. 一切恢复正常后,可以执行故障回复,继续在主要位置操作。When everything's running normally again, you can fail back and continue working in the primary location.

体系结构组件Architectural components

下表汇总了 Azure VM 灾难恢复所涉及的组件。The components involved in disaster recovery for Azure VMs are summarized in the following table.

组件Component 要求Requirements
源区域中的 VMVMs in source region 受支持源区域中的一个或多个 Azure VM。One of more Azure VMs in a supported source region.

VM 可以运行任何受支持的操作系统VMs can be running any supported operating system.
源 VM 存储Source VM storage 可以管理 Azure VM;它们还可以包含分散在不同存储帐户之间的非托管磁盘。Azure VMs can be managed, or have non-managed disks spread across storage accounts.

了解支持的 Azure 存储。Learn about supported Azure storage.
源 VM 网络Source VM networks VM 可以位于源区域中虚拟网络 (VNet) 上的一个或多个子网内。VMs can be located in one or more subnets in a virtual network (VNet) in the source region. 详细了解网络要求。Learn more about networking requirements.
缓存存储帐户Cache storage account 源网络中需要一个缓存存储帐户。You need a cache storage account in the source network. 在复制期间,VM 更改将存储在缓存中,然后再发送到目标存储。During replication, VM changes are stored in the cache before being sent to target storage. 缓存存储帐户必须是标准存储帐户。Cache storage accounts must be Standard.

使用缓存可确保尽量减少对 VM 上运行的生产应用程序造成的影响。Using a cache ensures minimal impact on production applications that are running on a VM.

详细了解缓存存储要求。Learn more about cache storage requirements.
目标资源Target resources 在复制期间以及发生故障转移时将使用目标资源。Target resources are used during replication, and when a failover occurs. Site Recovery 默认可以设置目标资源,你也可以自行创建/自定义目标资源。Site Recovery can set up target resource by default, or you can create/customize them.

在目标区域中,请检查是否能够创建 VM,以及你的订阅是否有足够的资源用于支持目标区域中所需的 VM 大小。In the target region, check that you're able to create VMs, and that your subscription has enough resources to support VM sizes that will be needed in the target region.

源和目标复制

目标资源Target resources

为 VM 启用复制时,Site Recovery 将提供用于自动创建目标资源的选项。When you enable replication for a VM, Site Recovery gives you the option of creating target resources automatically.

目标资源Target resource 默认设置Default setting
目标订阅Target subscription 与源订阅相同。Same as the source subscription.
目标资源组 Target resource group VM 在故障转移后所属的资源组。The resource group to which VMs belong after failover.

该组可以位于除源区域以外的其他任何 Azure 区域。It can be in any Azure region except the source region.

Site Recovery 将在目标区域中创建一个带有“asr”后缀的新资源组。Site Recovery creates a new resource group in the target region, with an "asr" suffix.

目标 VNetTarget VNet 复制的 VM 在故障转移后所处的虚拟网络 (VNet)。The virtual network (VNet) in which replicated VMs are located after failover. 创建源虚拟网络与目标虚拟网络之间的网络映射,反之亦然。A network mapping is created between source and target virtual networks, and vice versa.

Site Recovery 将创建带有“asr”后缀的新 VNet 和子网。Site Recovery creates a new VNet and subnet, with the "asr" suffix.
目标存储帐户Target storage account 如果 VM 不使用托管磁盘,则会将数据复制到此存储帐户。If the VM doesn't use a managed disk, this is the storage account to which data is replicated.

Site Recovery 将在目标区域中创建新的存储帐户,以镜像源存储帐户。Site Recovery creates a new storage account in the target region, to mirror the source storage account.
副本托管磁盘Replica managed disks 如果 VM 使用托管磁盘,则会将数据复制到此副本托管磁盘。If the VM uses a managed disk, this is the managed disks to which data is replicated.

Site Recovery 将在存储区域中创建副本托管磁盘用于镜像源。Site Recovery creates replica managed disks in the storage region to mirror the source.
目标可用性集 Target availability sets 复制的 VM 在故障转移后所处的可用性集。Availability set in which replicating VMs are located after failover.

对于源位置中某个可用性集内的 VM,Site Recovery 将在目标区域中创建一个带有“asr”后缀的可用性集。Site Recovery creates an availability set in the target region with the suffix "asr", for VMs that are located in an availability set in the source location. 如果存在某个可用性集,则会使用该可用性集,而不会新建。If an availability set exists, it's used and a new one isn't created.

管理目标资源Managing target resources

可按如下所述管理目标资源:You can manage target resources as follows:

  • 启用复制时可以修改目标设置。You can modify target settings as you enable replication.
  • 已开始复制后可以修改目标设置。You can modify target settings after replication is already working. 但可用性类型(单一实例、集或区域)除外。The exception is the availability type (single instance, set or zone). 若要更改此设置,需要禁用复制、修改设置,然后重新启用复制。To change this setting you need to disable replication, modify the setting, and then reenable.

复制策略Replication policy

启用 Azure VM 复制时,Site Recovery 默认会使用下表中汇总的默认设置创建新的复制策略。When you enable Azure VM replication, by default Site Recovery creates a new replication policy with the default settings summarized in the table.

策略设置Policy setting 详细信息Details 默认Default
恢复点保留期Recovery point retention 指定 Site Recovery 保留恢复点的时间长短Specifies how long Site Recovery keeps recovery points 24 小时24 hours
应用一致性快照频率App-consistent snapshot frequency Site Recovery 创建应用一致性快照的频率。How often Site Recovery takes an app-consistent snapshot. 每 4 小时Every four hours

管理复制策略Managing replication policies

可按如下所述管理和修改默认的复制策略设置:You can manage and modify the default replication policies settings as follows:

  • 启用复制时可以修改设置。You can modify the settings as you enable replication.
  • 随时可以创建复制策略,并在启用复制时应用该策略。You can create a replication policy at any time, and then apply it when you enable replication.

多 VM 一致性Multi-VM consistency

如果希望 VM 一同复制,并在故障转移时获得共享的崩溃一致性和应用一致性恢复点,可将这些 VM 集中到一个复制组中。If you want VMs to replicate together, and have shared crash-consistent and app-consistent recovery points at failover, you can gather them together into a replication group. 多 VM 一致性会影响工作负荷的性能。仅当 VM 运行的工作负荷需要在所有计算机之间保持一致时,才应该对这些 VM 使用此功能。Multi-VM consistency impacts workload performance, and should only be used for VMs running workloads that need consistency across all machines.

快照和恢复点Snapshots and recovery points

恢复点是基于在特定时间点生成的 VM 磁盘快照创建的。Recovery points are created from snapshots of VM disks taken at a specific point in time. 故障转移 VM 时,可以使用恢复点来还原目标位置中的 VM。When you fail over a VM, you use a recovery point to restore the VM in the target location.

故障转移时,我们通常想要确保 VM 在不发生任何数据损坏或丢失的情况下启动,并且 VM 数据可在操作系统以及 VM 上运行的应用中保持一致。When failing over, we generally want to ensure that the VM starts with no corruption or data loss, and that the VM data is consistent for the operating system, and for apps that run on the VM. 这取决于创建的快照类型。This depends on the type of snapshots taken.

Site Recovery 按如下所述创建快照:Site Recovery takes snapshots as follows:

  1. 默认情况下,Site Recovery 创建崩溃一致性数据快照;如果指定了频率,则创建应用一致性快照。Site Recovery takes crash-consistent snapshots of data by default, and app-consistent snapshots if you specify a frequency for them.
  2. 恢复点是基于快照创建的,根据复制策略中的保留期设置进行存储。Recovery points are created from the snapshots, and stored in accordance with retention settings in the replication policy.

一致性Consistency

下表解释了不同的一致性类型。The following table explains different types of consistency.

崩溃一致性Crash-consistent

说明Description 详细信息Details 建议 Recommendation
崩溃一致性快照捕获创建快照时磁盘上的数据。A crash consistent snapshot captures data that was on the disk when the snapshot was taken. 它不包括内存中的任何数据。It doesn't include anything in memory.

崩溃一致性快照包含在 VM 发生崩溃或者在创建快照的那一刻从服务器上拔下电源线时,磁盘上的等量数据。It contains the equivalent of the on-disk data that would be present if the VM crashed or the power cord was pulled from the server at the instant that the snapshot was taken.

崩溃一致性不能保证操作系统或 VM 上的应用中的数据一致性。A crash-consistent doesn't guarantee data consistency for the operating system, or for apps on the VM.
默认情况下,Site Recovery 每隔五分钟创建崩溃一致性恢复点。Site Recovery creates crash-consistent recovery points every five minutes by default. 此设置不可修改。This setting can't be modified.

目前,大多数应用都可以从崩溃一致性恢复点正常恢复。Today, most apps can recover well from crash-consistent points.

对于操作系统以及 DHCP 服务器和打印服务器等应用而言,崩溃一致性恢复点通常已足够。Crash-consistent recovery points are usually sufficient for the replication of operating systems, and apps such as DHCP servers and print servers.

应用一致性App-consistent

说明Description 详细信息Details 建议 Recommendation
应用一致性恢复点是基于应用一致性快照创建的。App-consistent recovery points are created from app-consistent snapshots.

应用一致性快照包含崩溃一致性快照中的所有信息,此外加上内存中的数据,以及正在进行的事务中的数据。An app-consistent snapshot contain all the information in a crash-consistent snapshot, plus all the data in memory and transactions in progress.
应用一致性快照使用卷影复制服务 (VSS):App-consistent snapshots use the Volume Shadow Copy Service (VSS):

1) 启动快照时,VSS 会在卷上执行写入时复制 (COW) 操作。1) When a snapshot is initiated, VSS perform a copy-on-write (COW) operation on the volume.

2) 执行 COW 之前,VSS 会告知计算机上的每个应用它需要将内存中的数据刷新到磁盘。2) Before it performs the COW, VSS informs every app on the machine that it needs to flush its memory-resident data to disk.

3) 然后,VSS 允许备份/灾难恢复应用(在本例中为 Site Recovery)读取快照数据并继续处理。3) VSS then allows the backup/disaster recovery app (in this case Site Recovery) to read the snapshot data and proceed.
应用一致性快照是按指定的频率创建的。App-consistent snapshots are taken in accordance with the frequency you specify. 此频率始终应小于为保留恢复点设置的频率。This frequency should always be less than you set for retaining recovery points. 例如,如果使用默认设置 24 小时保留恢复点,则应将频率设置为小于 24 小时。For example, if you retain recovery points using the default setting of 24 hours, you should set the frequency at less than 24 hours.

应用一致性快照比崩溃一致性快照更复杂,且完成时间更长。They're more complex and take longer to complete than crash-consistent snapshots.

应用一致性快照会影响已启用复制的 VM 上运行的应用的性能。They affect the performance of apps running on a VM enabled for replication.

复制过程Replication process

为 Azure VM 启用复制时,会发生以下情况:When you enable replication for an Azure VM, the following happens:

  1. 自动在 VM 上安装 Site Recovery 移动服务扩展。The Site Recovery Mobility service extension is automatically installed on the VM.

  2. 该扩展将 VM 注册到 Site Recovery。The extension registers the VM with Site Recovery.

  3. 开始 VM 的持续复制。Continuous replication begins for the VM. 磁盘写入内容立即传输到源位置中的缓存存储帐户。Disk writes are immediately transferred to the cache storage account in the source location.

  4. Site Recovery 处理缓存中的数据,并将其发送到目标存储帐户或副本托管磁盘。Site Recovery processes the data in the cache, and sends it to the target storage account, or to the replica managed disks.

  5. 处理数据后,每隔五分钟生成崩溃一致性恢复点。After the data is processed, crash-consistent recovery points are generated every five minutes. 根据复制策略中指定的设置生成应用一致性恢复点。App-consistent recovery points are generated according to the setting specified in the replication policy.

    启用复制过程,步骤 2

复制过程Replication process

连接要求Connectivity requirements

复制的 Azure VM 需要出站连接。The Azure VMs you replicate need outbound connectivity. Site Recovery 不必与 VM 建立入站连接。Site Recovery never needs inbound connectivity to the VM.

出站连接 (URL)Outbound connectivity (URLs)

如果使用 URL 控制 VM 的出站访问,请允许这些 URL。If outbound access for VMs is controlled with URLs, allow these URLs.

URLURL 详细信息Details
*.blob.core.chinacloudapi.cn*.blob.core.chinacloudapi.cn 允许将数据从 VM 写入源区域中的缓存存储帐户。Allows data to be written from the VM to the cache storage account in the source region.
login.chinacloudapi.cnlogin.chinacloudapi.cn 向 Site Recovery 服务 URL 提供授权和身份验证。Provides authorization and authentication to Site Recovery service URLs.
*.hypervrecoverymanager.windowsazure.cn*.hypervrecoverymanager.windowsazure.cn 允许 VM 与 Site Recovery 服务进行通信。Allows the VM to communicate with the Site Recovery service.
*.servicebus.chinacloudapi.cn*.servicebus.chinacloudapi.cn 允许 VM 写入 Site Recovery 监视和诊断数据。Allows the VM to write Site Recovery monitoring and diagnostics data.

IP 地址范围的出站连接Outbound connectivity for IP address ranges

若要使用 IP 地址控制 VM 的出站连接,请允许这些地址。To control outbound connectivity for VMs using IP addresses, allow these addresses. 请注意,可以在网络白皮书中找到网络连接要求的详细信息Please note that details of network connectivity requirements can be found in networking white paper

源区域规则Source region rules

规则Rule 详细信息Details 服务标记Service tag
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许对应于源区域中存储帐户的范围Allow ranges that correspond to storage accounts in the source region 存储Storage
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许对应于 Azure Active Directory (Azure AD) 的范围Allow ranges that correspond to Azure Active Directory (Azure AD) AzureActiveDirectoryAzureActiveDirectory
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许与目标区域中的事件中心对应的范围。Allow ranges that correspond to Events Hub in the target region. EventsHubEventsHub
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许访问对应于目标位置的 Site Recovery 终结点Allow access to Site Recovery endpoints that correspond to the target location.

目标区域规则Target region rules

规则Rule 详细信息Details 服务标记Service tag
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许对应于目标区域中存储帐户的范围。Allow ranges that correspond to storage accounts in the target region. 存储Storage
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许对应于 Azure AD 的范围Allow ranges that correspond to Azure AD AzureActiveDirectoryAzureActiveDirectory
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许与源区域中的事件中心对应的范围。Allow ranges that correspond to Events Hub in the source region. EventsHubEventsHub
允许 HTTPS 出站通信:端口 443Allow HTTPS outbound: port 443 允许访问对应于源位置的 Site Recovery 终结点Allow access to Site Recovery endpoints that correspond to the source location.

使用 NSG 规则控制访问Control access with NSG rules

如果使用 NSG 规则通过筛选传入和传出 Azure 网络/子网的网络流量来控制 VM 连接,请注意以下要求:If you control VM connectivity by filtering network traffic to and from Azure networks/subnets using NSG rules, note the following requirements:

  • 源 Azure 区域的 NSG 规则应允许复制流量进行出站访问。NSG rules for the source Azure region should allow outbound access for replication traffic.
  • 我们建议先在测试环境中创建规则,然后在生产环境中实施这些规则。We recommend you create rules in a test environment before you put them into production.
  • 使用服务标记,而不要允许单个 IP 地址。Use service tags instead of allowing individual IP addresses.
    • 服务标记表示集合在一起的一组 IP 地址前缀,可以最大程度地降低安全规则创建过程的复杂性。Service tags represent a group of IP address prefixes gathered together to minimize complexity when creating security rules.
    • Azure 会不断地自动更新服务标记。Azure automatically updates service tags over time.

详细了解 Site Recovery 的出站连接,以及如何使用 NSG 控制连接Learn more about outbound connectivity for Site Recovery, and controlling connectivity with NSGs.

多 VM 一致性的连接Connectivity for multi-VM consistency

如果启用了多 VM 一致性,则复制组中的计算机将通过端口 20004 相互通信。If you enable multi-VM consistency, machines in the replication group communicate with each other over port 20004.

  • 确保防火墙设备没有阻止 VM 之间通过端口 20004 进行的内部通信。Ensure that there is no firewall appliance blocking the internal communication between the VMs over port 20004.
  • 如果想要 Linux VM 成为复制组的一部分,请确保按照特定 Linux 版本的指南手动打开端口 20004 上的出站流量。If you want Linux VMs to be part of a replication group, ensure the outbound traffic on port 20004 is manually opened as per the guidance of the specific Linux version.

故障转移过程Failover process

如果启动故障转移,系统会在目标资源组、目标虚拟网络、目标子网和目标可用性集中创建 VM。When you initiate a failover, the VMs are created in the target resource group, target virtual network, target subnet, and in the target availability set. 可在故障转移过程中使用任意恢复点。During a failover, you can use any recovery point.

故障转移过程

后续步骤Next steps

将 Azure VM 快速复制到次要区域。Quickly replicate an Azure VM to a secondary region.