Azure IaaS 磁盘的备份和灾难恢复Backup and disaster recovery for Azure IaaS disks

本文介绍如何规划 Azure 中的 IaaS 虚拟机 (VM) 和磁盘的备份与灾难恢复 (DR)。This article explains how to plan for backup and disaster recovery (DR) of IaaS virtual machines (VMs) and disks in Azure. 本文档涉及托管磁盘和非托管磁盘。This document covers both managed and unmanaged disks.

首先介绍 Azure 平台内置的容错功能,此功能有助于预防本地故障的发生。First, we cover the built-in fault tolerance capabilities in the Azure platform that helps guard against local failures. 然后介绍内置功能未全面涵盖的灾难恢复方案。We then discuss the disaster scenarios not fully covered by the built-in capabilities. 此外,本文档演示了几个工作负荷方案示例,它们的备份和 DR 注意事项各不相同。We also show several examples of workload scenarios where different backup and DR considerations can apply. 最后,介绍适用于 IaaS 磁盘 DR 的可行解决方案。We then review possible solutions for the DR of IaaS disks.

简介Introduction

Azure 平台使用各种方法实现冗余和容错,以帮助客户避免本地硬件故障。The Azure platform uses various methods for redundancy and fault tolerance to help protect customers from localized hardware failures. 本地故障可能包括存储部分虚拟磁盘数据的 Azure 存储服务器计算机出现问题,或此服务器上的 SSD 或 HDD 发生故障。Local failures can include problems with an Azure Storage server machine that stores part of the data for a virtual disk or failures of an SSD or HDD on that server. 此类隔离的硬件组件故障可能会在正常操作期间发生。Such isolated hardware component failures can happen during normal operations.

Azure 平台旨在从这些故障中复原。The Azure platform is designed to be resilient to these failures. 重大灾难可能会导致大量存储服务器甚至整个数据中心发生故障或无法访问。Major disasters can result in failures or the inaccessibility of many storage servers or even a whole datacenter. 尽管本地故障通常不会对 VM 和磁盘造成影响,但有必要采取额外措施,以保护工作负荷不受整个区域内的灾难性故障(如重大灾难)影响,此类故障可能会影响 VM 和磁盘。Although your VMs and disks are normally protected from localized failures, additional steps are necessary to protect your workload from region-wide catastrophic failures, such as a major disaster, that can affect your VM and disks.

除了可能出现的平台故障外,客户应用程序或数据也可能会出现问题。In addition to the possibility of platform failures, problems with a customer application or data can occur. 例如,应用程序的新版本可能会在无意间对数据进行更改,导致应用程序中断。For example, a new version of your application might inadvertently make a change to the data that causes it to break. 在这种情况下,我们建议将应用程序和数据还原到上一版已知的良好状态。In that case, you might want to revert the application and the data to a prior version that contains the last known good state. 这需要维护定期备份。This requires maintaining regular backups.

对于区域性灾难恢复,必须在不同区域中备份 IaaS VM 磁盘。For regional disaster recovery, you must back up your IaaS VM disks to a different region.

在了解备份和 DR 选项之前,先来回顾一下本地故障的一些处理方法。Before we look at backup and DR options, let's recap a few methods available for handling localized failures.

Azure IaaS 复原Azure IaaS resiliency

复原 是指对硬件组件中发生的常见故障实现容错。Resiliency refers to the tolerance for normal failures that occur in hardware components. 通过复原,可以从故障中恢复,并继续正常运行。Resiliency is the ability to recover from failures and continue to function. 复原并不旨在避免故障发生,而是通过响应故障来避免故障时间或数据丢失。It's not about avoiding failures, but responding to failures in a way that avoids downtime or data loss. 复原的目标是在故障发生后将应用程序恢复到可完全正常运行的状态。The goal of resiliency is to return the application to a fully functioning state following a failure. Azure 虚拟机和磁盘旨在从常见硬件故障中复原。Azure virtual machines and disks are designed to be resilient to common hardware faults. 让我们来看看 Azure IaaS 平台是如何提供这种复原功能的。Let's look at how the Azure IaaS platform provides this resiliency.

虚拟机主要由以下两部分组成:计算服务器和永久性磁盘。A virtual machine consists mainly of two parts: a compute server and the persistent disks. 这两部分都会影响虚拟机的容错。Both affect the fault tolerance of a virtual machine.

如果托管 VM 的 Azure 计算主机服务器遇到硬件故障(极少数情况),那么 Azure 会在另一台服务器上自动还原 VM。If the Azure compute host server that houses your VM experiences a hardware failure, which is rare, Azure is designed to automatically restore the VM on another server. 如果发生这种情况,计算机会重新启动,并在一段时间后备份 VM。If this scenario, your computer reboots, and the VM comes back up after some time. 为了确保客户 VM 尽快可用,Azure 自动检测此类硬件故障,并执行恢复。Azure automatically detects such hardware failures and executes recoveries to help ensure the customer VM is available as soon as possible.

对于 IaaS 磁盘,数据持续性对永久性存储平台来说至关重要。Regarding IaaS disks, the durability of data is critical for a persistent storage platform. Azure 客户在 IaaS 上运行重要的商业应用程序,这些应用程序依赖数据的永久性。Azure customers have important business applications running on IaaS, and they depend on the persistence of the data. Azure 针对这些 IaaS 磁盘设计的保护措施是在本地存储数据的三个冗余副本。Azure designs protection for these IaaS disks, with three redundant copies of the data that is stored locally. 这些副本提供高持续性来应对本地故障。These copies provide for high durability against local failures. 如果托管磁盘的硬件组件之一出现故障,VM 不会受到影响,因为有两个额外的副本支持磁盘请求。If one of the hardware components that holds your disk fails, your VM is not affected, because there are two additional copies to support disk requests. 即使两个支持磁盘的不同硬件组件同时出现故障(这比较罕见),VM 也能正常运行。It works fine, even if two different hardware components that support a disk fail at the same time (which is rare).

为了确保始终维护三个副本,Azure 存储在后台自动生成一个新数据副本,以防三个副本中有一个无法使用。To ensure that you always maintain three replicas, Azure Storage automatically spawns a new copy of the data in the background if one of the three copies becomes unavailable. 因此,不应通过结合使用 RAID 与 Azure 磁盘来实现容错。Therefore, it should not be necessary to use RAID with Azure disks for fault tolerance. 简单的 RAID 0 配置应足以对磁盘进行必要分区,以创建更大的卷。A simple RAID 0 configuration should be sufficient for striping the disks, if necessary, to create larger volumes.

鉴于此体系结构,Azure 为 IaaS 磁盘不间断提供企业级数据持续性,年化故障率为 0%,达到行业领先水平。Because of this architecture, Azure has consistently delivered enterprise-grade durability for IaaS disks, with an industry-leading zero percent annualized failure rate.

计算主机或存储平台上的本地硬件故障有时可能会导致 VM 暂时不可用,有关 VM 可用性的 Azure SLA 对此做了介绍。Localized hardware faults on the compute host or in the Storage platform can sometimes result in of the temporary unavailability of the VM that is covered by the Azure SLA for VM availability. Azure 还提供了有关使用 Azure 高级 SSD 的单 VM 实例的行业领先 SLA。Azure also provides an industry-leading SLA for single VM instances that use Azure premium SSDs.

为了保护应用程序工作负荷不受磁盘或 VM 暂时不可用带来的故障时间影响,客户可以使用可用性集To safeguard application workloads from downtime due to the temporary unavailability of a disk or VM, customers can use availability sets. 可用性集中的两个或多个虚拟机为应用程序提供冗余。Two or more virtual machines in an availability set provide redundancy for the application. 然后,Azure 在电源、网络和服务器组件不同的单独容错域中创建这些 VM 和磁盘。Azure then creates these VMs and disks in separate fault domains with different power, network, and server components.

由于这些单独的容错域,本地硬件故障通常不会同时影响可用性集中的多个 VM。Because of these separate fault domains, localized hardware failures typically do not affect multiple VMs in the set at the same time. 单独的容错域为应用程序提供了高可用性。Having separate fault domains provides high availability for your application. 如果需要高可用性,最好使用可用性集。It's considered a good practice to use availability sets when high availability is required. 下一部分介绍灾难恢复方面。The next section covers the disaster recovery aspect.

备份和灾难恢复Backup and disaster recovery

灾难恢复是指能够从罕见但非常重大的事件中恢复。Disaster recovery is the ability to recover from rare, but major, incidents. 这些事件包括非暂时性的大规模故障,如影响整个区域的服务中断。These incidents include non-transient, wide-scale failures, such as service disruption that affects an entire region. 灾难恢复包括数据备份和存档,并且可能包括手动干预,如通过备份还原数据库。Disaster recovery includes data backup and archiving, and might include manual intervention, such as restoring a database from a backup.

如果出现导致大规模中断的重大灾难,Azure 平台内置的本地故障保护功能可能无法完全保护 VM/磁盘。The Azure platform's built-in protection against localized failures might not fully protect the VMs/disks if a major disaster causes large-scale outages. 这些大规模中断事件包括数据中心遭遇飓风、地震、火灾或大规模硬件单元故障等灾难性事件。These large-scale outages include catastrophic events, such as if a datacenter is hit by a hurricane, earthquake, fire, or if there is a large-scale hardware unit failure. 此外,可能还会遇到应用程序或数据问题导致的故障。In addition, you might encounter failures due to application or data issues.

若要保护 IaaS 工作负荷不受中断影响,应计划冗余和备份,以便能够进行恢复。To help protect your IaaS workloads from outages, you should plan for redundancy and have backups to enable recovery. 对于灾难恢复,应该在远离主站点的不同地理位置备份。For disaster recovery, you should back up in a different geographic location away from the primary site. 该方式有助于确保最初影响 VM 或磁盘的相同事件不会对备份造成影响。This approach helps ensure your backup is not affected by the same event that originally affected the VM or disks. 有关详细信息,请参阅 Azure 应用程序的灾难恢复For more information, see Disaster recovery for Azure applications.

DR 注意事项可能包括以下方面:Your DR considerations might include the following aspects:

  • 高可用性:应用程序能够以正常状态继续运行,而没有显著增加停机时间。High availability: The ability of the application to continue running in a healthy state, without significant downtime. “正常状态”是指,应用程序有响应,用户可以连接到应用程序,并与之交互。 By healthy state, this state means that the application is responsive, and users can connect to the application and interact with it. 某些任务关键型应用程序和数据库可能需要始终可用,即使平台上有故障,也不例外。Certain mission-critical applications and databases might be required to always be available, even when there are failures in the platform. 对于这些工作负荷,可能需要为应用程序和数据计划冗余。For these workloads, you might need to plan redundancy for the application, as well as the data.

  • 数据持续性:在某些情况下,主要注意事项是确保在灾难发生时保留数据。Data durability: In some cases, the main consideration is ensuring that the data is preserved if a disaster happens. 因此,可能需要在不同站点中备份数据。Therefore, you might need a backup of your data in a different site. 对于此类工作负荷,可能不需要为应用程序计划完全冗余,只需定期备份磁盘即可。For such workloads, you might not need full redundancy for the application, but only a regular backup of the disks.

备份和 DR 方案Backup and DR scenarios

让我们来看几个典型的应用程序工作负荷方案示例,以及规划灾难恢复时的注意事项。Let's look at a few typical examples of application workload scenarios and the considerations for planning for disaster recovery.

方案 1:主要数据库解决方案Scenario 1: Major database solutions

假设为 SQL Server 或 Oracle 等生产数据库服务器,可以支持高可用性。Consider a production database server, like SQL Server or Oracle, that can support high availability. 关键生产应用程序和用户依赖此类数据库。Critical production applications and users depend on this database. 此系统的灾难恢复计划可能需要满足以下要求:The disaster recovery plan for this system might need to support the following requirements:

  • 数据必须受保护且可恢复。The data must be protected and recoverable.
  • 服务器必须可用。The server must be available for use.

灾难恢复计划可能需要将不同区域中的数据库副本作为备份进行维护。The disaster recovery plan might require maintaining a replica of the database in a different region as a backup. 解决方案包括主动-主动或主动-被动副本站点、定期脱机备份数据,具体视服务器可用性和数据恢复要求而定。Depending on the requirements for server availability and data recovery, the solution might range from an active-active or active-passive replica site to periodic offline backups of the data. SQL Server 和 Oracle 等关系型数据库提供各种复制选项。Relational databases, such as SQL Server and Oracle, provide various options for replication. 对于 SQL Server,可以使用 SQL Server AlwaysOn 可用性组实现高可用性。For SQL Server, use SQL Server AlwaysOn Availability Groups for high availability.

为了实现冗余,MongoDB 等 NoSQL 数据库也支持副本NoSQL databases, like MongoDB, also support replicas for redundancy. 可以使用实现高可用性的副本。The replicas for high availability are used.

方案 2:冗余 VM 群集Scenario 2: A cluster of redundant VMs

假设由提供冗余和负载均衡的 VM 群集处理工作负荷。Consider a workload handled by a cluster of VMs that provide redundancy and load balancing. 例如,在区域中部署的 Cassandra 群集。One example is a Cassandra cluster deployed in a region. 此类体系结构已在相应区域提供了高水平冗余。This type of architecture already provides a high level of redundancy within that region. 不过,为了保护工作负荷免受区域级故障影响,应考虑在两个区域分布群集,或定期备份到另一个区域。However, to protect the workload from a regional-level failure, you should consider spreading the cluster across two regions or making periodic backups to another region.

方案 3:IaaS 应用程序工作负荷Scenario 3: IaaS application workload

让我们探讨一下 IaaS 应用程序工作负荷。Let's look at the IaaS application workload. 例如,该应用程序可能是 Azure VM 上运行的典型生产工作负荷。For example, this application might be a typical production workload running on an Azure VM. 它可能是保存内容和其他站点资源的 Web 服务器或文件服务器。It might be a web server or file server holding the content and other resources of a site. 也可能是在 VM 上运行的专门定制的商业应用程序,将数据、资源和应用程序状态存储到 VM 磁盘上。It might also be a custom-built business application running on a VM that stored its data, resources, and application state on the VM disks. 在这种情况下,请务必定期进行备份。In this case, it's important to make sure you take backups on a regular basis. 应根据 VM 工作负荷的性质确定备份频率。Backup frequency should be based on the nature of the VM workload. 例如,如果应用程序每天都运行,并且修改数据,那么应每小时备份一次。For example, if the application runs every day and modifies data, then the backup should be taken every hour.

再例如,报表服务器从其他数据源拉取数据,并生成聚合报表。Another example is a reporting server that pulls data from other sources and generates aggregated reports. 如果丢失此 VM 或磁盘,可能导致报表丢失。The loss of this VM or disks might lead to the loss of the reports. 不过,可以重新运行报表进程,并重新生成输出。However, it might be possible to rerun the reporting process and regenerate the output. 在这种情况下,即使报表服务器遭遇灾难,也不会真正丢失数据。In that case, you don't really have a loss of data, even if the reporting server is hit with a disaster. 因此,可以有高水平的容错,允许报表服务器上丢失部分数据。As a result, you might have a higher level of tolerance for losing part of the data on the reporting server. 在这种情况下,不太频繁地进行备份可以降低成本。In that case, less frequent backups are an option to reduce costs.

方案 4:IaaS 应用程序数据问题Scenario 4: IaaS application data issues

IaaS 应用程序数据问题是另一种可能的情况。IaaS application data issues are another possibility. 假设有一个应用程序,用于计算、维护和提供关键商业数据(如定价信息)。Consider an application that computes, maintains, and serves critical commercial data, such as pricing information. 新版应用程序有一个软件 bug,不仅错误地计算了定价,还破坏了平台提供的现有商业数据。A new version of your application had a software bug that incorrectly computed the pricing and corrupted the existing commerce data served by the platform. 在这种情况下,最好还原到旧版应用程序和数据。Here, the best course of action is to revert to the earlier version of the application and the data. 若要能够进行还原,请定期备份系统。To enable this, take periodic backups of your system.

灾难恢复解决方案:Azure 备份Disaster recovery solution: Azure Backup

Azure 备份服务用于备份和 DR,适用于托管磁盘和非托管磁盘。Azure Backup is used for backups and DR, and it works with managed disks as well as unmanaged disks. 可以创建备份作业,其中包含基于时间的备份、VM 轻松还原和备份保留策略。You can create a backup job with time-based backups, easy VM restoration, and backup retention policies.

如果将高级 SSD托管磁盘或其他类型磁盘与本地冗余存储选项结合使用,请务必创建定期 DR 备份。If you use premium SSDs, managed disks, or other disk types with the locally redundant storage option, it's especially important to make periodic DR backups. Azure 备份将数据存储到恢复服务保管库中,以供长期保留。Azure Backup stores the data in your recovery services vault for long-term retention. 对备份恢复服务保管库选择异地冗余存储选项。Choose the geo-redundant storage option for the backup recovery services vault. 该选项可确保将备份复制到其他 Azure 区域,以免受到区域灾难影响。That option ensures that backups are replicated to a different Azure region for safeguarding from regional disasters.

对于非托管磁盘,可将本地冗余存储类型用于 IaaS 磁盘,但要确保为 Azure 备份恢复服务保管库启用异地冗余存储选项。For unmanaged disks, you can use the locally redundant storage type for IaaS disks, but ensure that Azure Backup is enabled with the geo-redundant storage option for the recovery services vault.

Note

如果将异地冗余存储读取访问权限异地冗余存储选项用于非托管磁盘,仍需要为备份和 DR 生成一致性快照。If you use the geo-redundant storage or read-access geo-redundant storage option for your unmanaged disks, you still need consistent snapshots for backup and DR. 使用 Azure 备份一致性快照Use either Azure Backup or consistent snapshots.

下表汇总了可用于 DR 的解决方案。The following table is a summary of the solutions available for DR.

方案Scenario 自动复制Automatic replication DR 解决方案DR solution
高级·SSD 磁盘Premium SSD disks 本地(本地冗余存储Local (locally redundant storage) Azure 备份Azure Backup
托管磁盘Managed disks 本地(本地冗余存储Local (locally redundant storage) Azure 备份Azure Backup
非托管本地冗余存储磁盘Unmanaged locally redundant storage disks 本地(本地冗余存储Local (locally redundant storage) Azure 备份Azure Backup
非托管异地冗余存储磁盘Unmanaged geo-redundant storage disks 跨区域(异地冗余存储Cross region (geo-redundant storage) Azure 备份Azure Backup
一致性快照Consistent snapshots
非托管读取访问权限异地冗余存储磁盘Unmanaged read-access geo-redundant storage disks 跨区域(读取访问权限异地冗余存储Cross region (read-access geo-redundant storage) Azure 备份Azure Backup
一致性快照Consistent snapshots

在可用性集和 Azure 备份中使用托管磁盘是实现高可用性的最佳方式。High availability is best met by using managed disks in an availability set along with Azure Backup. 如果使用非托管磁盘,仍可以使用 Azure 备份进行 DR。If you use unmanaged disks, you can still use Azure Backup for DR. 如果无法使用 Azure 备份,请采用后面部分所述的一致性快照,作为备用的备份和 DR 解决方案。If you are unable to use Azure Backup, then taking consistent snapshots, as described in a later section, is an alternative solution for backup and DR.

下面展示了在应用程序或基础结构一级可选择的高可用性、备份和 DR 选项:Your choices for high availability, backup, and DR at application or infrastructure levels can be represented as follows:

LevelLevel 高可用性High availability 备份或 DRBackup or DR
应用程序Application SQL Server AlwaysOnSQL Server AlwaysOn Azure 备份Azure Backup
基础结构Infrastructure 可用性集Availability set 具有一致快照的异地冗余存储Geo-redundant storage with consistent snapshots

使用 Azure 备份Using Azure Backup

Azure 备份可将运行 Windows 或 Linux 的 VM 备份到 Azure 恢复服务保管库中。Azure Backup can back up your VMs running Windows or Linux to the Azure recovery services vault. 必须在生成数据的应用程序仍在运行时备份业务关键型数据,这让备份和还原业务关键型数据变得更加复杂。Backing up and restoring business-critical data is complicated by the fact that business-critical data must be backed up while the applications that produce the data are running.

为了解决此问题,Azure 备份为 Azure 工作负荷提供应用程序一致性备份。To address this issue, Azure Backup provides application-consistent backups for Azure workloads. 它使用卷影服务确保将数据正确写入存储中。It uses the volume shadow service to ensure that data is written correctly to storage. 对于 Linux VM,默认的备份一致性模式是文件一致性备份,因为 Linux 不像 Windows 那样具有等同于卷影服务的功能。For Linux VMs, the default backup consistency mode is file-consistent backups, because Linux does not have functionality equivalent to the volume shadow service as in the case of Windows. 对于 Linux 计算机,请参阅 Azure Linux VM 的应用程序一致性备份For Linux machines, see Application-consistent backup of Azure Linux VMs.

Azure 备份流

在原定时间启动备份作业时,Azure 备份会触发在 VM 中安装的备份扩展,以生成时间点快照。When Azure Backup initiates a backup job at the scheduled time, it triggers the backup extension installed in the VM to take a point-in-time snapshot. 创建快照时,会借助卷影服务来获取虚拟机中磁盘的一致性快照,不必关闭该虚拟机。A snapshot is taken in coordination with the volume shadow service to get a consistent snapshot of the disks in the virtual machine without having to shut it down. 生成所有磁盘的一致性快照前,VM 中的备份扩展会刷新所有写入。The backup extension in the VM flushes all writes before taking a consistent snapshot of all of the disks. 生成快照后,数据由 Azure 备份传输到备份保管库中。After taking the snapshot, the data is transferred by Azure Backup to the backup vault. 为了使备份过程更加高效,服务只标识并传输在上次备份后已更改的数据块。To make the backup process more efficient, the service identifies and transfers only the blocks of data that have changed after the last backup.

若要还原,可以通过 Azure 备份查看可用备份,再启动还原。To restore, you can view the available backups through Azure Backup and then initiate a restore. 可以通过 Azure 门户PowerShellAzure CLI 创建和还原 Azure 备份。You can create and restore Azure backups through the Azure portal, by using PowerShell, or by using the Azure CLI.

备份启用步骤Steps to enable a backup

执行以下步骤可以使用 Azure 门户启用 VM 备份。Use the following steps to enable backups of your VMs by using the Azure portal. 步骤可能有一些差异,具体视确切方案而定。There is some variation depending on your exact scenario. 如需了解完整详情,请参阅 Azure 备份一文。Refer to the Azure Backup documentation for full details. Azure 备份还支持使用托管磁盘的 VMAzure Backup also supports VMs with managed disks.

  1. 为 VM 创建恢复服务保管库:Create a recovery services vault for a VM:

    a.a. Azure 门户中,浏览到“所有资源”并找到“恢复服务保管库”。 In the Azure portal, browse All resources and find Recovery Services vaults.

    b.b. 在“恢复服务保管库”菜单上,单击“添加”,并按相关步骤操作,在 VM 所在区域中新建一个保管库。 On the Recovery Services vaults menu, click Add and follow the steps to create a new vault in the same region as the VM. 例如,如果 VM 位于“中国北部”区域,则选取“中国北部”作为保管库。For example, if your VM is in the China North region, pick China North for the vault.

  2. 验证新建保管库的存储复制功能。Verify the storage replication for the newly created vault. 访问“恢复服务保管库”下的保管库,并转到“属性” > “备份配置” > “更新”。 Access the vault under Recovery Services vaults and go to Properties > Backup Configuration > Update. 确保“异地冗余存储”选项默认处于选中状态。 Ensure the geo-redundant storage option is selected by default. 该选项可确保保管库自动复制到辅助数据中心。This option ensures that your vault is automatically replicated to a secondary datacenter. 例如,位于中国北部的保管库会自动复制到中国东部。For example, your vault in China North is automatically replicated to China East.

  3. 配置备份策略,再从同一 UI 中选择 VM。Configure the backup policy and select the VM from the same UI.

  4. 确保在 VM 上安装了备份代理。Make sure the Backup Agent is installed on the VM. 如果 VM 是使用 Azure 库映像创建而成,表明备份代理已安装。If your VM is created by using an Azure gallery image, then the Backup Agent is already installed. 否则(即使用的是自定义映像),请根据相关说明在虚拟机中安装 VM 代理Otherwise (that is, if you use a custom image), use the instructions to install the VM agent on a virtual machine.

  5. 确保 VM 允许备份服务的网络连接功能正常运行。Make sure that the VM allows network connectivity for the backup service to function. 遵循网络连接的说明。Follow the instructions for network connectivity.

  6. 完成上述步骤后,会按备份策略中指定的时间间隔定期进行备份。After the previous steps are completed, the backup runs at regular intervals as specified in the backup policy. 如有必要,可以在 Azure 门户的保管库仪表板中手动触发首个备份。If necessary, you can trigger the first backup manually from the vault dashboard on the Azure portal.

若要了解如何使用脚本自动执行 Azure 备份,请参阅用于备份 VM 的 PowerShell cmdletFor automating Azure Backup by using scripts, refer to PowerShell cmdlets for VM backup.

恢复步骤Steps for recovery

如果需要修复或重新生成 VM,可以从保管库中的任意备份恢复点还原 VM。If you need to repair or rebuild a VM, you can restore the VM from any of the backup recovery points in the vault. 可以通过下列两种不同方式来执行恢复:There are a couple of different options for performing the recovery:

  • 可以新建 VM,表示处于特定时间点的已备份 VM。You can create a new VM as a point-in-time representation of your backed-up VM.

  • 可以还原磁盘,再使用 VM 模板自定义和重新生成还原后的 VM。You can restore the disks, and then use the template for the VM to customize and rebuild the restored VM.

有关详细信息,请参阅使用 Azure 门户还原虚拟机的说明。For more information, see the instructions to use the Azure portal to restore virtual machines. 这篇文档还逐步介绍了在主数据中心发生灾难时,如何使用异地冗余备份保管库将已备份 VM 还原到已配对数据中心。This document also explains the specific steps for restoring backed-up VMs to a paired datacenter by using your geo-redundant backup vault if there is a disaster at the primary datacenter. 在这种情况下,Azure 备份使用次要区域中的计算服务来创建还原后的虚拟机。In that case, Azure Backup uses the Compute service from the secondary region to create the restored virtual machine.

也可以使用 PowerShell 从还原磁盘中创建新 VMYou can also use PowerShell for creating a new VM from restored disks.

替代解决方案:一致性快照Alternative solution: Consistent snapshots

如果无法使用 Azure 备份,可以使用快照实现自己的备份机制。If you are unable to use Azure Backup, you can implement your own backup mechanism by using snapshots. 为 VM 使用的所有磁盘创建一致性快照,再将这些快照复制到另一个区域的过程比较复杂。Creating consistent snapshots for all the disks used by a VM and then replicating those snapshots to another region is complicated. 因此,Azure 认为相对于生成自定义解决方案,使用备份服务是更好的选择。For this reason, Azure considers using the Backup service as a better option than building a custom solution.

如果将读取访问权限异地冗余存储/异地冗余存储用于磁盘,快照会自动复制到辅助数据中心。If you use read-access geo-redundant storage/geo-redundant storage for disks, snapshots are automatically replicated to a secondary datacenter. 如果将本地冗余存储用于磁盘,需要自行复制数据。If you use locally redundant storage for disks, you need to replicate the data yourself. 有关详细信息,请参阅使用增量快照备份 Azure 非托管 VM 磁盘For more information, see Back up Azure-unmanaged VM disks with incremental snapshots.

快照表示处于特定时间点的对象。A snapshot is a representation of an object at a specific point in time. 快照因保留数据的大小递增而产生费用。A snapshot incurs billing for the incremental size of the data it holds. 有关详细信息,请参阅创建 Blob 快照For more information, see Create a blob snapshot.

当 VM 正在运行时创建快照Create snapshots while the VM is running

尽管可以随时生成快照,但如果 VM 正在运行,就仍有数据流式传输到磁盘上。Although you can take a snapshot at any time, if the VM is running, there is still data being streamed to the disks. 快照可能包含部分正在测试的操作。The snapshots might contain partial operations that were in flight. 此外,如果涉及到多个磁盘,各个磁盘的快照可能在不同时间生成。Also, if there are several disks involved, the snapshots of different disks might have occurred at different times. 这些情况可能导致快照不协调。These scenarios may cause to the snapshots to be uncoordinated. 对于带分区的卷(如果在备份期间发生了更改,其文件就会被破坏),缺乏协调性导致的问题尤为严重。This lack of co-ordination is especially problematic for striped volumes whose files might be corrupted if changes were being made during backup.

为了避免这种情况发生,必须在备份过程中执行以下步骤:To avoid this situation, the backup process must implement the following steps:

  1. 冻结所有磁盘。Freeze all the disks.

  2. 刷新所有挂起写入。Flush all the pending writes.

  3. 为所有磁盘创建 Blob 快照Create a blob snapshot for all the disks.

某些 Windows 应用程序(如 SQL Server)通过卷影服务提供协调的备份机制,以创建应用程序一致性备份。Some Windows applications, like SQL Server, provide a coordinated backup mechanism via a volume shadow service to create application-consistent backups. 在 Linux 上,可以使用 fsfreeze 等工具来协调磁盘 。On Linux, you can use a tool like fsfreeze for coordinating the disks. 此工具提供文件一致性备份,而不是应用程序一致性快照。This tool provides file-consistent backups, but not application-consistent snapshots. 此过程比较复杂。因此,应考虑使用 Azure 备份或已实施此过程的第三方备份解决方案。This process is complex, so you should consider using Azure Backup or a third-party backup solution that already implements this procedure.

上述过程会生成所有 VM 磁盘的协调快照集合,用于表示处于特定时间点的 VM。The previous process results in a collection of coordinated snapshots for all of the VM disks, representing a specific point-in-time view of the VM. 这就是 VM 的备份还原点。This is a backup restore point for the VM. 可以按原定时间间隔重复执行此过程,从而创建定期备份。You can repeat the process at scheduled intervals to create periodic backups. 请参阅将备份复制到另一个区域,了解将快照复制到另一个区域进行 DR 的步骤。See Copy the backups to another region for steps to copy the snapshots to another region for DR.

当 VM 脱机时创建快照Create snapshots while the VM is offline

创建一致性备份的另一种做法是,关闭 VM,再为每个磁盘生成 Blob 快照。Another option to create consistent backups is to shut down the VM and take blob snapshots of each disk. 生成 Blob 快照比协调正在运行的 VM 的快照更为简单,但会出现几分钟的停机时间。Taking blob snapshots is easier than coordinating snapshots of a running VM, but it requires a few minutes of downtime.

  1. 关闭 VM。Shut down the VM.

  2. 创建每个虚拟硬盘 Blob 的快照,这只需要几秒钟的时间。Create a snapshot of each virtual hard drive blob, which only takes a few seconds.

    若要创建快照,可以使用 PowerShellAzure 存储 REST APIAzure CLI 或 Azure 存储客户端库之一(如用于 .NET 的存储客户端库)。To create a snapshot, you can use PowerShell, the Azure Storage REST API, Azure CLI, or one of the Azure Storage client libraries, such as the Storage client library for .NET.

  3. 启动 VM,这将终止故障时间。Start the VM, which ends the downtime. 整个过程通常会在几分钟内完成。Typically, the entire process finishes within a few minutes.

此过程生成所有磁盘的一组一致性快照,提供 VM 的备份还原点。This process yields a collection of consistent snapshots for all the disks, providing a backup restore point for the VM.

将快照复制到另一个区域Copy the snapshots to another region

仅仅创建快照可能不足以执行 DR。Creation of the snapshots alone might not be sufficient for DR. 还必须将快照备份复制到另一个区域。You must also replicate the snapshot backups to another region.

如果将异地冗余存储或读取访问权限异地冗余存储用于磁盘,快照会自动复制到次要区域。If you use geo-redundant storage or read-access geo-redundant storage for your disks, then the snapshots are replicated to the secondary region automatically. 复制前,可能会有几分钟的延迟。There can be a few minutes of lag before the replication. 如果在快照完成复制之前主数据中心出现故障,则无法从辅助数据中心访问快照。If the primary datacenter goes down before the snapshots finish replicating, you cannot access the snapshots from the secondary datacenter. 这种情况发生的可能性很小。The likelihood of this is small.

Note

仅在异地冗余存储或读取访问权限异地冗余存储帐户中使用磁盘无法保护 VM 免受灾难影响。Only having the disks in a geo-redundant storage or read-access geo-redundant storage account does not protect the VM from disasters. 还必须创建协调快照,或使用 Azure 备份。You must also create coordinated snapshots or use Azure Backup. 若要将 VM 恢复到一致状态,必须这样做。This is required to recover a VM to a consistent state.

如果使用本地冗余存储,必须在创建快照后立即将快照复制到其他存储帐户中。If you use locally redundant storage, you must copy the snapshots to a different storage account immediately after creating the snapshot. 复制目标可以是其他区域中的本地冗余存储帐户,这样便能将副本复制到远程区域中。The copy target might be a locally redundant storage account in a different region, resulting in the copy being in a remote region. 也可将快照复制到同一区域中的读取访问权限异地冗余存储帐户。You can also copy the snapshot to a read-access geo-redundant storage account in the same region. 在这种情况下,快照会延迟复制到远程次要区域。In this case, the snapshot is lazily replicated to the remote secondary region. 在复制完成后,备份就不会受到主站点所发生灾难的影响。Your backup is protected from disasters at the primary site after the copying and replication is complete.

若要有效复制 DR 增量快照,请参阅使用增量快照备份 Azure 非托管 VM 磁盘中的说明。To copy your incremental snapshots for DR efficiently, review the instructions in Back up Azure unmanaged VM disks with incremental snapshots.

通过递增快照备份 Azure 非托管 VM 磁盘

通过快照恢复Recovery from snapshots

若要检索快照,请进行复制,以新建 Blob。To retrieve a snapshot, copy it to make a new blob. 若要从主帐户复制快照,可将快照复制到快照的基 Blob。If you are copying the snapshot from the primary account, you can copy the snapshot over to the base blob of the snapshot. 此过程会将磁盘还原到快照。This process reverts the disk to the snapshot. 此过程称为“提升快照”。This process is known as promoting the snapshot. 若要从辅助帐户复制快照备份(使用读取访问权限异地冗余存储帐户时),必须将快照复制到主帐户。If you are copying the snapshot backup from a secondary account, in the case of a read-access geo-redundant storage account, you must copy it to a primary account. 可以使用 PowerShell 或 AzCopy 实用工具复制快照。You can copy a snapshot by using PowerShell or by using the AzCopy utility. 有关详细信息,请参阅使用 AzCopy 命令行实用工具传输数据For more information, see Transfer data with the AzCopy command-line utility.

对于包含多个磁盘的 VM,必须复制属于同一协调还原点的所有快照。For VMs with multiple disks, you must copy all the snapshots that are part of the same coordinated restore point. 将快照复制到可写 VHD Blob 后,可以通过 Blob 使用 VM 模板重新创建 VM。After you copy the snapshots to writable VHD blobs, you can use the blobs to recreate your VM by using the template for the VM.

其他选项Other options

SQL ServerSQL Server

在 VM 中运行的 SQL Server 有自己的内置功能,可将 SQL Server 数据库备份到 Azure Blob 存储或文件共享。SQL Server running in a VM has its own built-in capabilities to back up your SQL Server database to Azure Blob storage or a file share. 如果存储帐户是异地冗余存储或读取访问权限异地冗余存储,那么在发生灾难时,可以在存储帐户的辅助数据中心内访问这些备份,但需要遵循前面所述的相同限制。If the storage account is geo-redundant storage or read-access geo-redundant storage, you can access those backups in the storage account's secondary datacenter in the event of a disaster, with the same restrictions as previously discussed. 有关详细信息,请参阅 Azure 虚拟机中 SQL Server 的备份和还原For more information, see Back up and restore for SQL Server in Azure virtual machines. 除了备份和还原外,SQL Server AlwaysOn 可用性组还可以维护数据库的次要副本。In addition to back up and restore, SQL Server AlwaysOn availability groups can maintain secondary replicas of databases. 这种功能可以大大减少灾难恢复时间。This ability greatly reduces the disaster recovery time.

其他注意事项Other considerations

本文介绍了如何通过备份或生成 VM 及其磁盘的快照来支持灾难恢复,以及如何使用这些备份或快照恢复数据。This article has discussed how to back up or take snapshots of your VMs and their disks to support disaster recovery and how to use those backups or snapshots to recover your data. 通过 Azure 资源管理器模型,许多人都可以使用模板在 Azure 中创建虚拟机和其他基础结构。With the Azure Resource Manager model, many people use templates to create their VMs and other infrastructures in Azure. 可以使用模板创建每次配置都相同的 VM。You can use a template to create a VM that has the same configuration every time. 如果使用自定义映像创建 VM,还必须务必使用读取访问权限异地冗余存储帐户存储这些映像,从而保护它们。If you use custom images for creating your VMs, you must also make sure that your images are protected by using a read-access geo-redundant storage account to store them.

因此,备份过程可能分为以下两部分:Consequently, your backup process can be a combination of two things:

  • 备份数据(磁盘)。Back up the data (disks).
  • 备份配置(模板和自定义映像)。Back up the configuration (templates and custom images).

可能需要备份数据和配置,或备份服务可能会处理所有这些事务,具体视选择的备份选项而定。Depending on the backup option you choose, you might have to handle the backup of both the data and the configuration, or the backup service might handle all of that for you.

附录:了解数据冗余的影响Appendix: Understanding the impact of data redundancy

对于 Azure 中的存储帐户,应就灾难恢复考虑三类数据冗余:本地冗余、异地冗余或提供读取访问权限的异地冗余。For storage accounts in Azure, there are three types of data redundancy that you should consider regarding disaster recovery: locally redundant, geo-redundant, or geo-redundant with read access.

本地冗余存储在同一数据中心内保留三个数据副本。Locally redundant storage retains three copies of the data in the same datacenter. VM 写入数据时,所有三个副本都在向调用方返回成功结果之前进行更新,因此可以知道它们是完全相同的。When the VM writes the data, all three copies are updated before success is returned to the caller, so you know they are identical. 磁盘不受本地故障影响,因为这三个副本不可能都同时受到影响。Your disk is protected from local failures, because it's unlikely that all three copies are affected at the same time. 使用本地冗余存储时不存在异地冗余,因此影响整个数据中心或存储单元的灾难性故障可能会对磁盘造成影响。In the case of locally redundant storage, there is no geo-redundancy, so the disk is not protected from catastrophic failures that can affect an entire datacenter or storage unit.

使用异地冗余存储和读取访问权限异地冗余存储时,三个数据副本都保留在自行选择的主要区域。With geo-redundant storage and read-access geo-redundant storage, three copies of your data are retained in the primary region that is selected by you. 另外三个数据副本保留在 Azure 设置的相应次要区域。Three more copies of your data are retained in a corresponding secondary region that is set by Azure. 例如,如果将数据存储在中国北部,则数据会复制到中国东部。For example, if you store data in China North, the data is replicated to China East. 副本保留是以异步方式完成的,主要站点和辅助站点的更新稍有延迟。Copy retention is done asynchronously, and there is a small delay between updates to the primary and secondary sites. 辅助站点上的磁盘副本具有每磁盘一致性(有延迟),但多个活动磁盘的副本可能不会彼此同步。Replicas of the disks on the secondary site are consistent on a per-disk basis (with the delay), but replicas of multiple active disks might not be in sync with each other. 若要跨多个磁盘拥有一致副本,需要创建一致性快照。To have consistent replicas across multiple disks, consistent snapshots are needed.

异地冗余存储和读取访问权限异地冗余存储 的主要区别在于,使用读取访问权限异地冗余存储可以随时读取辅助副本。The main difference between geo-redundant storage and read-access geo-redundant storage is that with read-access geo-redundant storage, you can read the secondary copy at any time. 如果出现问题,导致主要区域中的数据无法访问,那么 Azure 团队会竭尽全力恢复访问。If there is a problem that renders the data in the primary region inaccessible, the Azure team makes every effort to restore access. 尽管主要区域出现故障,但如果启用了读取访问权限异地冗余存储,便可以在辅助数据中心访问数据。While the primary is down, if you have read-access geo-redundant storage enabled, you can access the data in the secondary datacenter. 因此,如果计划在主要区域无法访问时读取副本,应考虑使用读取访问权限异地冗余存储。Therefore, if you plan to read from the replica while the primary is inaccessible, then read-access geo-redundant storage should be considered.

如果演变成严重故障,Azure 团队可能会触发异地故障转移,并将主 DNS 条目更改为指向辅助存储。If it turns out to be a significant outage, the Azure team might trigger a geo-failover and change the primary DNS entries to point to secondary storage. 此时,如果启用了异地冗余存储或读取访问权限异地冗余存储,便可以在之前为次要区域的区域中访问数据。At this point, if you have either geo-redundant storage or read-access geo-redundant storage enabled, you can access the data in the region that used to be the secondary. 也就是说,如果存储帐户是异地冗余存储且出现故障,那么只有在进行异地故障转移后,才能访问辅助存储。In other words, if your storage account is geo-redundant storage and there is a problem, you can access the secondary storage only if there is a geo-failover.

有关详细信息,请参阅在 Azure 存储中断时该怎么办For more information, see What to do if an Azure Storage outage occurs.

Note

Azure 控制是否进行故障转移。Azure controls whether a failover occurs. 故障转移无法由存储帐户控制,因此并非由个人客户决定。Failover is not controlled per storage account, so it's not decided by individual customers. 若要为特定存储帐户或虚拟机磁盘实现灾难恢复,必须使用本文前面所述的方法。To implement disaster recovery for specific storage accounts or virtual machine disks, you must use the techniques described previously in this article.