灾难恢复和存储帐户故障转移Disaster recovery and storage account failover

Azure 致力于确保 Azure 服务一直可用。Azure strives to ensure that Azure services are always available. 不过,可能会发生计划外服务中断。However, unplanned service outages may occur. 如果应用程序需要复原能力,Azure 建议使用异地冗余存储,以便将数据复制到另一个区域。If your application requires resiliency, Azure recommends using geo-redundant storage, so that your data is copied to a second region. 此外,客户还应制定用于处理区域服务中断的灾难恢复计划。Additionally, customers should have a disaster recovery plan in place for handling a regional service outage. 灾难恢复计划的一个重要组成部分是,准备在主终结点不可用时将故障转移到辅助终结点。An important part of a disaster recovery plan is preparing to fail over to the secondary endpoint in the event that the primary endpoint becomes unavailable.

Azure 存储支持异地冗余存储帐户的故障转移。Azure Storage supports account failover for geo-redundant storage accounts. 通过帐户故障转移,可以在主终结点不可用时为存储帐户启动故障转移过程。With account failover, you can initiate the failover process for your storage account if the primary endpoint becomes unavailable. 故障转移将辅助终结点更新为,存储帐户的主终结点。The failover updates the secondary endpoint to become the primary endpoint for your storage account. 在故障转移完成后,客户端便可以开始对新的主终结点执行写入操作。Once the failover is complete, clients can begin writing to the new primary endpoint.

帐户故障转移适用于常规用途 v1、常规用途 v2 以及使用 Azure 资源管理器部署的 Blob 存储帐户类型。Account failover is available for general-purpose v1, general-purpose v2, and Blob storage account types with Azure Resource Manager deployments.

本文介绍了帐户故障转移所涉及的概念和过程,以及如何让存储帐户做好恢复准备,且造成的客户影响最小。This article describes the concepts and process involved with an account failover and discusses how to prepare your storage account for recovery with the least amount of customer impact. 若要了解如何在 Azure 门户或 PowerShell 中启动帐户故障转移,请参阅启动帐户故障转移To learn how to initiate an account failover in the Azure portal or PowerShell, see Initiate an account failover.

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

选择正确的冗余选项Choose the right redundancy option

Azure 存储将维护存储帐户的多个副本,以确保持续性和高可用性。Azure Storage maintains multiple copies of your storage account to ensure durability and high availability. 为帐户选择哪个冗余选项取决于所需的复原能力水平。Which redundancy option you choose for your account depends on the degree of resiliency you need. 为了防止区域中断,请为你的帐户配置异地冗余存储,无论是否选择从次要区域进行读取访问:For protection against regional outages, configure your account for geo-redundant storage, with or without the option of read access from the secondary region:

异地冗余存储 (GRS) 在至少相距数百英里的两个地理区域中异步复制数据。Geo-redundant storage (GRS) copies your data asynchronously in two geographic regions that are at least hundreds of miles apart. 如果主要区域遭遇服务中断,次要区域便会成为数据的冗余源。If the primary region suffers an outage, then the secondary region serves as a redundant source for your data. 可以通过启动故障转移,将辅助终结点转换为主终结点。You can initiate a failover to transform the secondary endpoint into the primary endpoint.

读取访问权限异地冗余存储 (RA-GRS) :为异地冗余存储提供附加优势,即对辅助终结点的读取访问权限。Read-access geo-redundant storage (RA-GRS) provides geo-redundant storage with the additional benefit of read access to the secondary endpoint. 如果主终结点发生中断,配置为对辅助终结点进行读取访问并设计为高度可用的应用程序可以继续从辅助终结点读取数据。If an outage occurs in the primary endpoint, applications configured for read access to the secondary and designed for high availability can continue to read from the secondary endpoint. Azure 建议使用 RA-GRS 来实现应用程序的最大可用性和持续性。Azure recommends RA-GRS for maximum availability and durability for your applications.

有关 Azure 存储中冗余的详细信息,请参阅 Azure 存储冗余For more information about redundancy in Azure Storage, see Azure Storage redundancy.

警告

异地冗余存储有数据丢失风险。Geo-redundant storage carries a risk of data loss. 数据是异步复制到次要区域,这意味着数据写入主要区域与数据写入次要区域之间存在延迟。Data is copied to the secondary region asynchronously, meaning there is a delay between when data written to the primary region is written to the secondary region. 发生服务中断时,尚未复制到辅助终结点的对主终结点的写入操作将丢失。In the event of an outage, write operations to the primary endpoint that have not yet been copied to the secondary endpoint will be lost.

旨在实现高可用性Design for high availability

请务必从一开始就设计高可用性应用程序。It's important to design your application for high availability from the start. 有关设计应用程序和计划灾难恢复方面的指导,请参阅以下 Azure 资源:Refer to these Azure resources for guidance in designing your application and planning for disaster recovery:

此外,还请注意下面这些可保持 Azure 存储数据高可用性的最佳做法:Additionally, keep in mind these best practices for maintaining high availability for your Azure Storage data:

跟踪服务中断Track outages

客户可以订阅 Azure 服务运行状况仪表板,以跟踪 Azure 存储和其他 Azure 服务的运行状况和状态。Customers may subscribe to the Azure Service Health Dashboard to track the health and status of Azure Storage and other Azure services.

Azure 还建议将应用程序设计为可以应对可能出现的写入故障。Azure also recommends that you design your application to prepare for the possibility of write failures. 应用程序应公开写入故障,以提醒你主要区域可能存在服务中断。Your application should expose write failures in a way that alerts you to the possibility of an outage in the primary region.

了解帐户故障转移过程Understand the account failover process

借助客户管理的帐户故障转移,可以在主要区域因任何原因而不可用时,将整个存储帐户故障转移到次要区域。Customer-managed account failover enables you to fail your entire storage account over to the secondary region if the primary becomes unavailable for any reason. 如果你强制故障转移到次要区域,客户端可以在故障转移完成后开始向辅助终结点写入数据。When you force a failover to the secondary region, clients can begin writing data to the secondary endpoint after the failover is complete. 故障转移通常需要大约一小时才能完成。The failover typically takes about an hour.

帐户故障转移的工作原理How an account failover works

在正常情况下,客户端将数据写入主要区域中的 Azure 存储帐户,并将此数据异步复制到次要区域。Under normal circumstances, a client writes data to an Azure Storage account in the primary region, and that data is copied asynchronously to the secondary region. 下图展示了主要区域可用时的场景:The following image shows the scenario when the primary region is available:

客户端将数据写入主要区域中的存储帐户

如果主终结点因任何原因而不可用,客户端无法再向存储帐户写入数据。If the primary endpoint becomes unavailable for any reason, the client is no longer able to write to the storage account. 下图展示了主终结点不可用、但尚未执行恢复时的场景:The following image shows the scenario where the primary has become unavailable, but no recovery has happened yet:

主终结点不可用,因此客户端无法写入数据

客户启动帐户故障转移到辅助终结点。The customer initiates the account failover to the secondary endpoint. 故障转移过程更新 Azure 存储提供的 DNS 条目,这样辅助终结点就会成为存储帐户的新主终结点,如下图所示:The failover process updates the DNS entry provided by Azure Storage so that the secondary endpoint becomes the new primary endpoint for your storage account, as shown in the following image:

客户启动帐户故障转移到辅助终结点

在 DNS 条目已更新且请求定向到新的主终结点后,异地冗余帐户便会恢复写入访问权限。Write access is restored for geo-redundant accounts once the DNS entry has been updated and requests are being directed to the new primary endpoint. 在故障转移完成后,用于 blob、表、队列和文件的现有存储服务终结点保持不变。Existing storage service endpoints for blobs, tables, queues, and files remain the same after the failover.

重要

在故障转移完成后,存储帐户被配置为在新的主终结点中本地冗余。After the failover is complete, the storage account is configured to be locally redundant in the new primary endpoint. 若要继续复制到新的辅助终结点,请将帐户重新配置为使用异地冗余。To resume replication to the new secondary, configure the account for geo-redundancy again.

请注意,将 LRS 帐户转换为使用异地冗余会产生费用。Keep in mind that converting an LRS account to use geo-redundancy incurs a cost. 在故障转移完成后,对新的主要区域中的存储帐户进行更新也会产生此费用。This cost applies to updating the storage account in the new primary region after a failover.

预测数据丢失Anticipate data loss

注意

帐户故障转移通常涉及一些数据丢失。An account failover usually involves some data loss. 请务必了解启动帐户故障转移的影响。It's important to understand the implications of initiating an account failover.

因为数据是从主要区域异步写入次要区域,所以在写入主要区域的数据复制到次要区域前始终存在延迟。Because data is written asynchronously from the primary region to the secondary region, there is always a delay before a write to the primary region is copied to the secondary region. 如果主要区域不可用,最新写入数据可能尚未复制到次要区域。If the primary region becomes unavailable, the most recent writes may not yet have been copied to the secondary region.

如果强制执行故障转移,主要区域中的所有数据就会在次要区域成为新的主要区域且存储帐户配置为本地冗余时丢失。When you force a failover, all data in the primary region is lost as the secondary region becomes the new primary region and the storage account is configured to be locally redundant. 当故障转移发生时,将保留已复制到次要区域的所有数据。All data already copied to the secondary is maintained when the failover happens. 不过,任何写入主要区域、但尚未复制到次要区域的数据会永久丢失。However, any data written to the primary that has not also been copied to the secondary is lost permanently.

“上次同步时间”属性表示,最近一次保证已将主要区域中的数据写入次要区域的时间。The Last Sync Time property indicates the most recent time that data from the primary region is guaranteed to have been written to the secondary region. 上次同步时间之前写入的所有数据都已复制到次要区域中,而在上次同步时间之后写入的数据则可能尚未写入次要区域并发生丢失。All data written prior to the last sync time is available on the secondary, while data written after the last sync time may not have been written to the secondary and may be lost. 在发生服务中断时,使用此属性可估计启动帐户故障转移可能会造成的数据丢失量。Use this property in the event of an outage to estimate the amount of data loss you may incur by initiating an account failover.

最佳做法是,将应用程序设计为,可以使用上次同步时间来评估预期数据丢失。As a best practice, design your application so that you can use the last sync time to evaluate expected data loss. 例如,若要记录所有写入操作,可以比较上次写入操作时间与上次同步时间,以确定哪些写入操作尚未同步到次要区域。For example, if you are logging all write operations, then you can compare the time of your last write operations to the last sync time to determine which writes have not been synced to the secondary.

如需详细了解如何检查“上次同步时间”属性,请参阅检查存储帐户的“上次同步时间”属性For more information about checking the Last Sync Time property, see Check the Last Sync Time property for a storage account.

谨慎故障回复到原始主要区域Use caution when failing back to the original primary

从主要区域故障转移到次要区域后,存储帐户被配置为在新的主要区域中本地冗余。After you fail over from the primary to the secondary region, your storage account is configured to be locally redundant in the new primary region. 然后可以将帐户重新配置为使用异地冗余。You can then configure the account for geo-redundancy again. 如果帐户在故障转移完成后再次配置为使用异地冗余,新的主要区域就会立即开始将数据复制到新的次要区域(在原始故障转移发生前为主要区域)。When the account is configured for geo-redundancy again after a failover, the new primary region immediately begins copying data to the new secondary region, which was the primary before the original failover. 不过,将主要区域中的现有数据完全复制到新的次要区域可能需要一段时间才能完成。However, it may take some time before existing data in the primary is fully copied to the new secondary.

为存储帐户重新配置异地冗余后,可以启动另一个故障转移,从新的主要区域故障回复到新的次要区域。After the storage account is reconfigured for geo-redundancy, it's possible to initiate another failover from the new primary back to the new secondary. 在这种情况下,故障转移发生前的原始主要区域重新成为主要区域,并配置为本地冗余。In this case, the original primary region prior to the failover becomes the primary region again, and is configured to be locally redundant. 故障转移完成后的主要区域(即原始次要区域)中的所有数据都会丢失。All data in the post-failover primary region (the original secondary) is then lost. 如果在故障回复前存储帐户中的大部分数据都尚未复制到新的次要区域,可能会丢失大量数据。If most of the data in the storage account has not been copied to the new secondary before you fail back, you could suffer a major data loss.

为了避免大量数据丢失,请在故障回复前检查“上次同步时间”属性的值。To avoid a major data loss, check the value of the Last Sync Time property before failing back. 若要评估预期数据丢失,请比较上次同步时间与数据上次写入新的主要区域时间。Compare the last sync time to the last times that data was written to the new primary to evaluate expected data loss.

启动帐户故障转移Initiate an account failover

可以从 Azure 门户、PowerShell、Azure CLI 或 Azure 存储资源提供程序 API 启动帐户故障转移。You can initiate an account failover from the Azure portal, PowerShell, Azure CLI, or the Azure Storage resource provider API. 若要详细了解如何启动故障转移,请参阅启动帐户故障转移For more information on how to initiate a failover, see Initiate an account failover.

其他注意事项Additional considerations

请参阅本部分中介绍的其他注意事项,了解强制执行故障转移时对应用程序和服务可能产生的影响。Review the additional considerations described in this section to understand how your applications and services may be affected when you force a failover.

包含已存档 blob 的存储帐户Storage account containing archived blobs

包含已存档 blob 的存储帐户支持帐户故障转移。Storage accounts containing archived blobs support account failover. 故障转移完成后,需要先将所有已存档 blob 都解除冻结到联机层,然后才能将帐户配置为使用异地冗余。After failover is complete, all archived blobs need to be rehydrated to an online tier before the account can be configured for geo-redundancy.

存储资源提供程序Storage resource provider

故障转移完成后,客户端可再次读取并写入新的主要区域中的 Azure 存储数据。After a failover is complete, clients can again read and write Azure Storage data in the new primary region. 但是,Azure 存储资源提供程序不会进行故障转移,因此资源管理操作仍必须在主要区域中进行。However, the Azure Storage resource provider does not fail over, so resource management operations must still take place in the primary region. 如果主要区域不可用,将无法对存储帐户执行管理操作。If the primary region is unavailable, you will not be able to perform management operations on the storage account.

由于 Azure 存储资源提供程序不会进行故障转移,因此在故障转移完成后,Location 属性将返回原始主位置。Because the Azure Storage resource provider does not fail over, the Location property will return the original primary location after the failover is complete.

Azure 虚拟机Azure virtual machines

Azure 虚拟机 (VM) 不会在帐户故障转移过程中进行故障转移。Azure virtual machines (VMs) do not fail over as part of an account failover. 如果主要区域不可用且你故障转移到次要区域,那么就需要在故障转移完成后重新创建所有 VM。If the primary region becomes unavailable, and you fail over to the secondary region, then you will need to recreate any VMs after the failover. 此外,还可能会丢失与帐户故障转移相关联的数据。Also, there is a potential data loss associated with the account failover. Azure 建议使用以下特定于 Azure 中虚拟机的高可用性灾难恢复指南。Azure recommends the following high availability and disaster recovery guidance specific to virtual machines in Azure.

Azure 非托管磁盘Azure unmanaged disks

根据最佳做法,Azure 建议将非托管磁盘转换为托管磁盘。As a best practice, Azure recommends converting unmanaged disks to managed disks. 不过,如果需要对包含附加到 Azure VM 的非托管磁盘的帐户进行故障转移,必须在启动故障转移前关闭 VM。However, if you need to fail over an account that contains unmanaged disks attached to Azure VMs, you will need to shut down the VM before initiating the failover.

非托管磁盘在 Azure 存储中存储为页 blob。Unmanaged disks are stored as page blobs in Azure Storage. 如果 VM 在 Azure 中运行,任何附加到 VM 的非托管磁盘都会被租用。When a VM is running in Azure, any unmanaged disks attached to the VM are leased. 如果 blob 上有租用,便无法继续帐户故障转移。An account failover cannot proceed when there is a lease on a blob. 若要执行故障转移,请按照以下步骤操作:To perform the failover, follow these steps:

  1. 开始前,先记下所有非托管磁盘的名称、逻辑单元号 (LUN) 和附加到的 VM。Before you begin, note the names of any unmanaged disks, their logical unit numbers (LUN), and the VM to which they are attached. 此操作可便于在故障转移完成后更轻松地重新附加磁盘。Doing so will make it easier to reattach the disks after the failover.
  2. 关闭 VM。Shut down the VM.
  3. 删除 VM,但保留非托管磁盘的 VHD 文件。Delete the VM, but retain the VHD files for the unmanaged disks. 记下 VM 删除时间。Note the time at which you deleted the VM.
  4. 等到“上次同步时间”已更新且晚于 VM 删除时间。Wait until the Last Sync Time has updated, and is later than the time at which you deleted the VM. 这一步很重要,因为如果在故障转移发生时辅助终结点尚未使用 VHD 文件完全更新,那么 VM 可能无法在新的主要区域中正常运行。This step is important, because if the secondary endpoint has not been fully updated with the VHD files when the failover occurs, then the VM may not function properly in the new primary region.
  5. 启动帐户故障转移。Initiate the account failover.
  6. 等到帐户故障转移完成,且次要区域已成为新的主要区域。Wait until the account failover is complete and the secondary region has become the new primary region.
  7. 在新的主要区域中创建 VM,并重新附加 VHD。Create a VM in the new primary region and reattach the VHDs.
  8. 启动新 VM。Start the new VM.

请注意,当 VM 关闭时,临时磁盘中存储的任何数据都会丢失。Keep in mind that any data stored in a temporary disk is lost when the VM is shut down.

不支持的功能和服务Unsupported features and services

帐户故障转移不支持以下功能或服务:The following features and services are not supported for account failover:

  • 目前不支持 ADLS Gen2 存储帐户(已启用分层命名空间的帐户)。ADLS Gen2 storage accounts (accounts that have hierarchical namespace enabled) are not supported at this time.
  • 无法对包含高级块 blob 的存储帐户执行故障转移。A storage account containing premium block blobs cannot be failed over. 支持高级块 blob 的存储帐户暂不支持异地冗余。Storage accounts that support premium block blobs do not currently support geo-redundancy.
  • 无法对包含任何已启用 WORM 不可变性策略的容器执行故障转移。A storage account containing any WORM immutability policy enabled containers cannot be failed over. 已解锁/锁定的基于时间的保留或法定保留策略会阻止故障转移,以便保持合规性。Unlocked/locked time-based retention or legal hold policies prevent failover in order to maintain compliance.

除了故障转移外,还可以复制数据Copying data as an alternative to failover

如果将存储帐户配置为具有对次要区域的读取访问权限,则可以将应用程序设计为从辅助终结点进行读取。If your storage account is configured for read access to the secondary, then you can design your application to read from the secondary endpoint. 如果不希望在主要区域发生服务中断时进行故障转移,可使用 AzCopyAzure PowerShellAzure 数据移动库等工具,将数据从次要区域中的存储帐户复制到未受影响区域中的另一个存储帐户。If you prefer not to fail over in the event of an outage in the primary region, you can use tools such as AzCopy, Azure PowerShell, or the Azure Data Movement library to copy data from your storage account in the secondary region to another storage account in an unaffected region. 然后,可以将应用程序指向此存储帐户,以进行读写操作。You can then point your applications to that storage account for both read and write availability.

注意

不应将帐户故障转移用作数据迁移策略的一部分。An account failover should not be used as part of your data migration strategy.

Azure 托管的故障转移Azure-managed failover

在由于重大灾难而导致区域丢失的极端情况下,Azure 可能会启动区域故障转移。In extreme circumstances where a region is lost due to a significant disaster, Azure may initiate a regional failover. 在此情况下,不需要采取任何操作。In this case, no action on your part is required. 在 Azure 托管的故障转移完成之前,你对存储帐户不拥有写入访问权限。Until the Azure-managed failover has completed, you won't have write access to your storage account. 如果存储帐户已配置 RA-GRS,应用程序可以从次要区域读取数据。Your applications can read from the secondary region if your storage account is configured for RA-GRS.

另请参阅See also