使用 ExpressRoute 专用对等互连进行灾难恢复设计Designing for disaster recovery with ExpressRoute private peering

ExpressRoute 旨在实现高可用性,以便与 Azure 资源建立运营商级专用网络连接。ExpressRoute is designed for high availability to provide carrier grade private network connectivity to Azure resources. 换句话说,Azure 网络中的 ExpressRoute 路径不存在单一故障点。In other words, there is no single point of failure in the ExpressRoute path within Azure network. 有关最大化 ExpressRoute 线路可用性的设计注意事项,请参阅使用 ExpressRoute 进行高可用性设计For design considerations to maximize the availability of an ExpressRoute circuit, see Designing for high availability with ExpressRoute.

但是,考虑到墨菲的一句格言“如果某件事可能会出错,那么它就会出错”,本文将重点分析的解决方案并不局限于使用单条 ExpressRoute 线路可以解决的故障。 However, taking Murphy's popular adage--if anything can go wrong, it will--into consideration, in this article let us focus on solutions that go beyond failures that can be addressed using a single ExpressRoute circuit. 换而言之,本文将会探讨使用异地冗余 ExpressRoute 线路构建可靠的后端网络连接以实现灾难恢复时的网络体系结构考虑因素。In other words, in this article let us look into network architecture considerations for building robust backend network connectivity for disaster recovery using geo-redundant ExpressRoute circuits.

对冗余连接解决方案的需求Need for redundant connectivity solution

在某些情况下,(Azure、网络服务提供商、客户或其他云服务提供商的)整个区域性服务可能会降级。There are possibilities and instances where an entire regional service (be it that of Azure, network service providers, customer, or other cloud service providers) gets degraded. 造成此类区域范围的服务影响的根本原因包括自然灾难。The root cause for such regional wide service impact include natural calamity. 因此,若要实现业务连续性并使任务关键型应用程序保持正常运行,必须规划好灾难恢复。Therefore, for business continuity and mission critical applications it is important to plan for disaster recovery.

无论是在 Azure 区域、本地还是其他任何位置运行任务关键型应用程序,都可以使用另一个 Azure 区域作为故障转移站点。Irrespective of whether you run your mission critical applications in an Azure region or on-premises or anywhere else, you can use another Azure region as your failover site. 以下文章介绍了从应用程序和前端访问角度进行的灾难恢复:The following articles addresses disaster recovery from applications and frontend access perspectives:

如果你依赖于使用本地网络与 Azure 之间的 ExpressRoute 连接来执行任务关键的操作,则还应该在灾难恢复计划中包含异地冗余的网络连接。If you rely on ExpressRoute connectivity between your on-premises network and Azure for mission critical operations, your disaster recovery plan should also include geo-redundant network connectivity.

使用多条 ExpressRoute 线路时的难点Challenges of using multiple ExpressRoute circuits

如果你使用多个连接来互连一组相同的网络,则会在网络之间引入并行路径。When you interconnect the same set of networks using more than one connection, you introduce parallel paths between the networks. 未正确架构的并行路径可能会导致非对称路由。Parallel paths, when not properly architected, could lead to asymmetrical routing. 如果路径中包含有状态实体(例如 NAT、防火墙),则非对称路由可能会阻止流量流。If you have stateful entities (for example, NAT, firewall) in the path, asymmetrical routing could block traffic flow. 通常,ExpressRoute 专用对等路径中不会包含 NAT 或防火墙等有状态实体。Typically, over the ExpressRoute private peering path you won't come across stateful entities such as NAT or Firewalls. 因此,通过 ExpressRoute 专用对等互连进行非对称路由不一定会阻止流量流。Therefore, asymmetrical routing over ExpressRoute private peering does not necessarily block traffic flow.

但是,如果你对异地冗余的并行路径中的流量进行负载均衡,则不管是否存在有状态实体,都会遇到不一致的网络性能。However, if you load balance traffic across geo-redundant parallel paths, irrespective of whether you have stateful entities or not, you would experience inconsistent network performance. 本文介绍如何解决这些难题。In this article, let's discuss how to address these challenges.

中小型本地网络的考虑因素Small to medium on-premises network considerations

让我们考虑下图所示的示例网络。Let's consider the example network illustrated in the following diagram. 在该示例中,Contoso 的本地位置与 Contoso 的 Azure 区域中 VNet 之间建立了异地冗余的 ExpressRoute 连接。In the example, geo-redundant ExpressRoute connectivity is established between a Contoso's on-premises location and Contoso's VNet in an Azure region. 在图中,绿色实线表示首选路径(通过 ExpressRoute 1),虚线表示备用路径(通过 ExpressRoute 2)。In the diagram, solid green line indicates preferred path (via ExpressRoute 1) and the dotted one represents stand-by path (via ExpressRoute 2).

11

为灾难恢复设计 ExpressRoute 连接时,需要考虑以下因素:When you are designing ExpressRoute connectivity for disaster recovery, you need to consider:

  • 使用异地冗余的 ExpressRoute 线路using geo-redundant ExpressRoute circuits
  • 对不同的 ExpressRoute 线路使用多样化的服务提供商网络using diverse service provider network(s) for different ExpressRoute circuit
  • 高可用性设计每条 ExpressRoute 线路designing each of the ExpressRoute circuit for high availability
  • 在客户网络上的不同位置终止不同的 ExpressRoute 线路terminating the different ExpressRoute circuit in different location on the customer network

默认情况下,如果你在所有 ExpressRoute 路径中以相同的方式播发路由,则 Azure 将使用成本相同的多路径 (ECMP) 路由对所有 ExpressRoute 路径中的本地绑定流量进行负载均衡。By default, if you advertise routes identically over all the ExpressRoute paths, Azure will load-balance on-premises bound traffic across all the ExpressRoute paths using Equal-cost multi-path (ECMP) routing.

但是,在使用异地冗余的 ExpressRoute 线路时,我们需要考虑到不同网络路径的不同网络性能(尤其是网络延迟)。However, with the geo-redundant ExpressRoute circuits we need to take into consideration different network performances with different network paths (particularly for network latency). 若要在正常操作期间获得更加一致的网络性能,可以优先使用延迟最低的 ExpressRoute 线路。To get more consistent network performance during normal operation, you may want to prefer the ExpressRoute circuit that offers the minimal latency.

可以使用以下方法之一来影响 Azure,以优先使用其中的一条 ExpressRoute 线路:You can influence Azure to prefer one ExpressRoute circuit over another one using one of the following techniques (listed in the order of effectiveness):

  • 通过首选的 ExpressRoute 线路而不是其他 ExpressRoute 线路播发更具体的路由advertising more specific route over the preferred ExpressRoute circuit compared to other ExpressRoute circuit(s)
  • 在用于将虚拟网络链接到首选 ExpressRoute 线路的连接上配置较高的连接权重configuring higher Connection Weight on the connection that links the virtual network to the preferred ExpressRoute circuit
  • 通过 AS 路径(AS 路径预置)较长的不太选用的 ExpressRoute 路线播发路由advertising the routes over less preferred ExpressRoute circuit with longer AS Path (AS Path prepend)

更具体的路由More specific route

下图演示了如何使用更具体的路由播发来影响 ExpressRoute 路径选择。The following diagram illustrates influencing ExpressRoute path selection using more specific route advertisement. 在演示的示例中,Contoso 的本地 /24 IP 范围通过首选路径 (ExpressRoute 1) 播发为两个 /25 地址范围,并通过备用路径 (ExpressRoute 2) 播发为 /24 地址范围。In the illustrated example, Contoso on-premises /24 IP range is advertised as two /25 address ranges via the preferred path (ExpressRoute 1) and as /24 via the stand-by path (ExpressRoute 2).

22

由于 /25 相比 /24 而言更具体,在正常状态下,Azure 将通过 ExpressRoute 1 将流量发送到 10.1.11.0/24。Because /25 is more specific, compared to /24, Azure would send the traffic destined to 10.1.11.0/24 via ExpressRoute 1 in the normal state. 如果 ExpressRoute 1 的连接已关闭,则 VNet 只会看到通过 ExpressRoute 2 进行的 10.1.11.0/24 路由播发;因此,在这种故障状态下会使用备用线路。If both the connections of ExpressRoute 1 go down, then the VNet would see the 10.1.11.0/24 route advertisement only via ExpressRoute 2; and therefore the standby circuit is used in this failure state.

连接权重Connection weight

以下屏幕截图演示了如何通过 Azure 门户配置 ExpressRoute 连接的权重。The following screenshot illustrates configuring the weight of an ExpressRoute connection via Azure portal.

33

下图演示了如何使用连接权重来影响 ExpressRoute 路径选择。The following diagram illustrates influencing ExpressRoute path selection using connection weight. 默认连接权重为 0。The default connection weight is 0. 在以下示例中,ExpressRoute 1 的连接权重配置为 100。In the example below, the weight of the connection for ExpressRoute 1 is configured as 100. 当 VNet 收到通过多条 ExpressRoute 线路播发的路由前缀时,VNet 将优先使用权重最高的连接。When a VNet receives a route prefix advertised via more than one ExpressRoute circuit, the VNet will prefer the connection with the highest weight.

44

如果 ExpressRoute 1 的连接已关闭,则 VNet 只会看到通过 ExpressRoute 2 进行的 10.1.11.0/24 路由播发;因此,在这种故障状态下会使用备用线路。If both the connections of ExpressRoute 1 go down, then the VNet would see the 10.1.11.0/24 route advertisement only via ExpressRoute 2; and therefore the standby circuit is used in this failure state.

AS 路径预置AS path prepend

下图演示了如何使用 AS 路径前置来影响 ExpressRoute 路径选择。The following diagram illustrates influencing ExpressRoute path selection using AS path prepend. 在图中,通过 ExpressRoute 1 进行的路由播发指示 eBGP 的默认行为。In the diagram, the route advertisement over ExpressRoute 1 indicates the default behavior of eBGP. 在通过 ExpressRoute 2 进行的路由播发中,本地网络的 ASN 额外预置在路由的 AS 路径中。On the route advertisement over ExpressRoute 2, the on-premises network's ASN is prepended additionally on the route's AS path. 通过多条 ExpressRoute 线路收到相同的路由时,根据 eBGP 路由选择过程,VNet 将优先使用 AS 路径最短的路由。When the same route is received through multiple ExpressRoute circuits, per the eBGP route selection process, VNet would prefer the route with the shortest AS path.

55

如果 ExpressRoute 1 的两个连接都关闭,则 VNet 只会看到通过 ExpressRoute 2 进行的 10.1.11.0/24 路由播发。If both the connections of ExpressRoute 1 go down, then the VNet would see the 10.1.11.0/24 route advertisement only via ExpressRoute 2. 因此,较长的 AS 路径不起作用。Consequentially, the longer AS path would become irrelevant. 在这种故障状态下,将使用备用线路。Therefore, the standby circuit would be used in this failure state.

使用任一方法时,如果你通过影响 Azure 来优先使用其中的一条 ExpressRoute 线路,则还需要确保本地网络也对 Azure 绑定流量优先使用同一条 ExpressRoute 路径,以避免非对称流。Using any of the techniques, if you influence Azure to prefer one of your ExpressRoute over others, you also need to ensure the on-premises network also prefer the same ExpressRoute path for Azure bound traffic to avoid asymmetric flows. 通常,会使用本地首选项值来影响本地网络,以优先使用其中的一条 ExpressRoute 线路。Typically, local preference value is used to influence on-premises network to prefer one ExpressRoute circuit over others. 本地首选项是一个内部 BGP (iBGP) 指标。Local preference is an internal BGP (iBGP) metric. 优先使用本地首选项值最高的 BGP 路由。The BGP route with the highest local preference value is preferred.

重要

使用特定的 ExpressRoute 线路作为备用线路时,需要主动对其进行管理,并定期测试故障转移操作。When you use certain ExpressRoute circuits as stand-by, you need to actively manage them and periodically test failover operation.

大型分布式企业网络Large distributed enterprise network

如果你使用大型分布式企业网络,则可能已部署多条 ExpressRoute 线路。When you have a large distributed enterprise network, you're likely to have multiple ExpressRoute circuits. 本部分介绍如何使用主动-主动 ExpressRoute 线路(无需额外的备用线路)设计灾难恢复。In this section, let's see how to design disaster recovery using the active-active ExpressRoute circuits, without needing additional stand-by circuits.

让我们考虑下图所示的示例。Let's consider the example illustrated in the following diagram. 在此示例中,Contoso 已通过两个不同对等互连位置的 ExpressRoute 线路,将两个本地位置连接到两个不同 Azure 区域中的两个 Contoso IaaS 部署。In the example, Contoso has two on-premises locations connected to two Contoso IaaS deployment in two different Azure regions via ExpressRoute circuits in two different peering locations.

66

灾难恢复的架构方式会影响跨区域到跨位置(区域 1/区域 2 到位置 2/位置 1)流量的路由方式。How we architect the disaster recovery has an impact on how cross regional to cross location (region1/region2 to location2/location1) traffic is routed. 假设有两个不同的灾难恢复体系结构以不同的方式路由跨区域-跨位置的流量。Let's consider two different disaster architectures that routes cross region-location traffic differently.

方案 1Scenario 1

在第一种方案中,让我们设计这样一种灾难恢复方式:Azure 区域与本地网络之间的所有流量在稳定状态下通过本地 ExpressRoute 线路传送。In the first scenario, let's design disaster recovery such that all the traffic between an Azure region and on-premises network flow through the local ExpressRoute circuit in the steady state. 如果本地 ExpressRoute 线路出现故障,则使用远程 ExpressRoute 线路传送 Azure 与本地网络之间的所有流量。If the local ExpressRoute circuit fails, then the remote ExpressRoute circuit is used for all the traffic flows between Azure and on-premises network.

下图演示了方案 1。Scenario 1 is illustrated in the following diagram. 在此图中,绿线表示 VNet1 与本地网络之间的流量流路径。In the diagram, green lines indicate paths for traffic flow between VNet1 and on-premises networks. 蓝线表示 VNet2 与本地网络之间的流量流路径。The blue lines indicate paths for traffic flow between VNet2 and on-premises networks. 实线表示处于稳定状态的所需路径,虚线表示在携带稳定状态流量流的相应 ExpressRoute 线路发生故障时使用的流量路径。Solid lines indicate desired path in the steady-state and the dashed lines indicate traffic path in the failure of the corresponding ExpressRoute circuit that carries steady-state traffic flow.

77

可以使用连接权重来架构方案,以影响 VNet 优先使用与本地对等互连位置 ExpressRoute 建立的连接来传送本地网络绑定的流量。You can architect the scenario using connection weight to influence VNets to prefer connection to local peering location ExpressRoute for on-premises network bound traffic. 若要完成该解决方案,需确保对称反转流量流。To complete the solution, you need to ensure symmetrical reverse traffic flow. 可以在 BGP 路由器(其上的 ExpressRoute 线路在本地端终止)之间的 iBGP 会话中使用本地首选项,以优先使用 ExpressRoute 线路。You can use local preference on the iBGP session between your BGP routers (on which ExpressRoute circuits are terminated on on-premises side) to prefer a ExpressRoute circuit. 下图演示了该解决方案。The solution is illustrated in the following diagram.

88

方案 2Scenario 2

下图演示了方案 2。The Scenario 2 is illustrated in the following diagram. 在此图中,绿线表示 VNet1 与本地网络之间的流量流路径。In the diagram, green lines indicate paths for traffic flow between VNet1 and on-premises networks. 蓝线表示 VNet2 与本地网络之间的流量流路径。The blue lines indicate paths for traffic flow between VNet2 and on-premises networks. 在稳定状态下(图中以实线表示),VNet 与本地位置之间的所有流量主要通过 Microsoft 主干网络传送;只有在 ExpressRoute 故障状态下(图中以虚线表示),这些流量才通过本地位置之间的互连传送。In the steady-state (solid lines in the diagram), all the traffic between VNets and on-premises locations flow via Microsoft backbone for the most part, and flows through the interconnection between on-premises locations only in the failure state (dotted lines in the diagram) of an ExpressRoute.

99

下图演示了该解决方案。The solution is illustrated in the following diagram. 如图所示,可以使用更具体的路由(选项 1)或 AS 路径预置(选项 2)来架构方案,以影响 VNet 路径选择。As illustrated, you can architect the scenario either using more specific route (Option 1) or AS-path prepend (Option 2) to influence VNet path selection. 若要影响 Azure 绑定流量的本地网络路由选择,需要将本地位置之间的互连配置为不常选用。To influence on-premises network route selection for Azure bound traffic, you need configure the interconnection between the on-premises location as less preferable. 将互连链接配置为首选连接的方式取决于在本地网络中使用的路由协议。Howe you configure the interconnection link as preferable depends on the routing protocol used within the on-premises network. 可以配合 iBGP 或 IGP 指标(OSPF 或 IS)使用本地首选项。You can use local preference with iBGP or metric with IGP (OSPF or IS-IS).

1010

后续步骤Next steps

本文已介绍如何设计 ExpressRoute 线路专用对等互连的灾难恢复。In this article, we discussed how to design for disaster recovery of an ExpressRoute circuit private peering connectivity. 以下文章介绍了从应用程序和前端访问角度进行的灾难恢复:The following articles addresses disaster recovery from applications and frontend access perspectives: