发生影响 Azure 云服务的 Azure 服务中断时该怎么办What to do in the event of an Azure service disruption that impacts Azure Cloud Services

Microsoft 的同仁兢兢业业,只为确保在任何时候都能提供需要的服务。At Microsoft, we work hard to make sure that our services are always available to you when you need them. 但有时候会因为不可抗力的影响,造成服务意外中断。Forces beyond our control sometimes impact us in ways that cause unplanned service disruptions.

Microsoft 为其服务提供服务级别协议 (SLA),作为运行时间和连接承诺。Microsoft provides a Service Level Agreement (SLA) for its services as a commitment for uptime and connectivity. 可以在 Azure 服务级别协议中找到各种 Azure 服务的 SLA。The SLA for individual Azure services can be found at Azure Service Level Agreements.

Azure 已在平台中内置多种功能,用于支持高度可用的应用程序。Azure already has many built-in platform features that support highly available applications. 有关这些服务的详细信息,请参阅 Azure 应用程序的灾难恢复和高可用性For more about these services, read Disaster recovery and high availability for Azure applications.

本文介绍了当整个区域因重大自然灾难或大规模服务中断而发生中断时的真实灾难恢复方案。This article covers a true disaster recovery scenario, when a whole region experiences an outage due to major natural disaster or widespread service interruption. 这些都是极其罕见的情况,但你还是必须对整个区域发生中断的可能性有所准备。These are rare occurrences, but you must prepare for the possibility that there is an outage of an entire region. 如果整个区域的服务中断,会暂时无法使用数据的本地冗余副本。If an entire region experiences a service disruption, the locally redundant copies of your data would temporarily be unavailable. 如果启用了异地复制,则会在其他区域额外存储 Azure 存储 blob 和表的三个副本。If you have enabled geo-replication, three additional copies of your Azure Storage blobs and tables are stored in a different region. 如果发生全面性区域中断或发生主要区域无法恢复的灾难,Azure 会将所有 DNS 条目重新映射到异地复制区域。In the event of a complete regional outage or a disaster in which the primary region is not recoverable, Azure remaps all of the DNS entries to the geo-replicated region.

备注

注意,对此过程无任何控制权,并且此过程仅适用于数据中心范围的服务中断。Be aware that you do not have any control over this process, and it will only occur for datacenter-wide service disruptions. 因此,还必须依靠应用程序特有的其他备份方法才能达到最高级别的可用性。Because of this, you must also rely on other application-specific backup strategies to achieve the highest level of availability. 有关详细信息,请参阅 构建在 Azure 基础之上的应用程序灾难恢复和高可用性For more information, see Disaster recovery and high availability for applications built on Azure. 如果要能够影响自己的故障转移,则可能需要考虑使用读取访问异地冗余存储 (RA-GRS),这会在其他区域中创建数据的只读副本。If you would like to be able to affect your own failover, you might want to consider the use of read-access geo-redundant storage (RA-GRS), which creates a read-only copy of your data in another region.

选项 1:通过 Azure 流量管理器使用备份部署Option 1: Use a backup deployment through Azure Traffic Manager

最可靠的灾难恢复解决方案涉及在不同区域维护应用程序的多个部署,并使用 Azure 流量管理器引导它们之间的流量。The most robust disaster recovery solution involves maintaining multiple deployments of your application in different regions, then using Azure Traffic Manager to direct traffic between them. Azure 流量管理器提供多个路由方法,因此可选择使用主/备份模型管理部署或拆分它们之间的流量。Azure Traffic Manager provides multiple routing methods, so you can choose whether to manage your deployments using a primary/backup model or to split traffic between them.

使用 Azure 流量管理器跨区域平衡 Azure 云服务

若要实现对区域丢失作出最快响应,配置流量管理器的终结点监视非常重要。For the fastest response to the loss of a region, it is important that you configure Traffic Manager's endpoint monitoring.

选项 2:将应用程序部署到新区域Option 2: Deploy your application to a new region

上一选项中所述的维持多个活动部署会持续产生额外成本。Maintaining multiple active deployments as described in the previous option incurs additional ongoing costs. 如果恢复时间目标 (RTO) 足够灵活且你具有原始代码或已编译云服务包,可在另一区域中创建一个应用程序新实例,并更新 DNS 记录以指向新部署。If your recovery time objective (RTO) is flexible enough and you have the original code or compiled Cloud Services package, you can create a new instance of your application in another region and update your DNS records to point to the new deployment.

有关如何创建和部署云服务应用程序的详细信息,请参阅如何创建和部署云服务For more detail about how to create and deploy a cloud service application, see How to create and deploy a cloud service.

根据应用程序数据源,可能需要检查应用程序数据源的恢复过程。Depending on your application data sources, you may need to check the recovery procedures for your application data source.

选项 3:等待恢复Option 3: Wait for recovery

这种情况下,无需进行任何操作,但是在区域还原前服务不可用。In this case, no action on your part is required, but your service will be unavailable until the region is restored. 可在 Azure 服务运行状况仪表板上查看当前服务状态。You can see the current service status on the Azure Service Health Dashboard.

后续步骤Next steps

若要详细了解如何实现灾难恢复和高可用性策略,请参阅 Azure 应用程序的灾难恢复和高可用性To learn more about how to implement a disaster recovery and high availability strategy, see Disaster recovery and high availability for Azure applications.