为事件网格中的自定义主题构建自己的灾难恢复方案Build your own disaster recovery for custom topics in Event Grid

灾难恢复侧重于从严重的应用程序功能丧失中恢复。Disaster recovery focuses on recovering from a severe loss of application functionality. 本教程逐步讲解如何设置事件处理体系结构,以便在特定区域中的事件网格服务不正常时能够予以恢复。This tutorial will walk you through how to set up your eventing architecture to recover if the Event Grid service becomes unhealthy in a particular region.

本教程将介绍如何为事件网格中的自定义主题创建主动-被动故障转移体系结构。In this tutorial, you'll learn how to create an active-passive failover architecture for custom topics in Event Grid. 实现故障转移的方式为:在两个区域之间镜像主题和订阅,然后管理当某个主题不正常时执行的故障转移。You'll accomplish failover by mirroring your topics and subscriptions across two regions and then managing a failover when a topic becomes unhealthy. 本教程中的体系结构可故障转移所有新流量。The architecture in this tutorial fails over all new traffic. 必须注意,使用此设置时,在有问题的区域再次正常之前,正在进行的事件不可恢复。it's important to be aware, with this setup, events already in flight won't be recovered until the compromised region is healthy again.

备注

事件网格现在支持服务器端的自动异地灾难恢复 (GeoDR)。Event Grid supports automatic geo disaster recovery (GeoDR) on the server side now. 如果希望更好地控制故障转移过程,仍然可以实现客户端灾难恢复逻辑。You can still implement client-side disaster recovery logic if you want a greater control on the failover process. 有关自动 GeoDR 的详细信息,请参阅 Azure 事件网格中的服务器端异地灾难恢复For details about automatic GeoDR, see Server-side geo disaster recovery in Azure Event Grid.

创建消息终结点Create a message endpoint

若要测试故障转移配置,需要使用一个终结点来接收事件。To test your failover configuration, you'll need an endpoint to receive your events at. 该终结点不是故障转移基础结构的一部分,而是充当事件处理程序,以便于测试。The endpoint isn't part of your failover infrastructure, but will act as our event handler to make it easier to test.

为了简化测试,请部署一个用于显示事件消息的预生成 Web 应用To simplify testing, deploy a pre-built web app that displays the event messages. 所部署的解决方案包括应用服务计划、应用服务 Web 应用和 GitHub 中的源代码。The deployed solution includes an App Service plan, an App Service web app, and source code from GitHub.

  1. 选择“部署到 Azure”将解决方案部署到你的订阅。Select Deploy to Azure to deploy the solution to your subscription. 在 Azure 门户中,为参数提供值。In the Azure portal, provide values for the parameters.

    Button to Deploy to Aquent.

  2. 部署可能需要几分钟才能完成。The deployment may take a few minutes to complete. 部署成功后,请查看 Web 应用以确保它正在运行。After the deployment has succeeded, view your web app to make sure it's running. 在 Web 浏览器中导航到 https://<your-site-name>.chinacloudsites.cnIn a web browser, navigate to: https://<your-site-name>.chinacloudsites.cn 请务必记下此 URL,因为稍后需要用到。Make sure to note this URL as you'll need it later.

  3. 查看站点,但是尚未有事件发布到它。You see the site but no events have been posted to it yet.

    查看新站点

启用事件网格资源提供程序Enable Event Grid resource provider

如果以前未在 Azure 订阅中使用过事件网格,则可能需要注册事件网格资源提供程序。If you haven't previously used Event Grid in your Azure subscription, you may need to register the Event Grid resource provider.

在 Azure 门户中:In the Azure portal:

  1. 选择 订阅Select Subscriptions.
  2. 选择要用于事件网格的订阅。Select the subscription you're using for Event Grid.
  3. 在“设置”下,选择“资源提供程序”。Under Settings, select Resource providers.
  4. 找到 Microsoft.EventGridFind Microsoft.EventGrid.
  5. 如果尚未注册,请选择“注册”。If not registered, select Register.

完成注册可能需要一些时间。It may take a moment for the registration to finish. 选择“刷新”可更新状态。Select Refresh to update the status. 当“状态”为“已注册”后,即可继续。When Status is Registered, you're ready to continue.

创建主要和辅助主题Create your primary and secondary topics

首先创建两个事件网格主题。First, create two Event Grid topics. 这些主题充当主要和辅助主题。These topics will act as your primary and secondary. 默认情况下,事件将会通过主要主题传送。By default, your events will flow through your primary topic. 如果主要区域中发生服务中断,次要区域将接管工作。If there is a service outage in the primary region, your secondary will take over.

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 在 Azure 主菜单的左上角,选择“所有服务”> 搜索“事件网格”> 选择“事件网格主题”。 From the upper left corner of the main Azure menu, choose All services > search for Event Grid > select Event Grid Topics.

    事件网格主题菜单

    选择“事件网格主题”旁边的星号,将其添加到资源菜单以方便将来进行访问。Select the star next to Event Grid Topics to add it to resource menu for easier access in the future.

  3. 在“事件网格主题”菜单中,选择“+添加”以创建主要主题。In the Event Grid Topics Menu, select +ADD to create your primary topic.

    • 为该主题提供逻辑名称,并添加“-primary”后缀以方便跟踪。Give the topic a logical name and add "-primary" as a suffix to make it easy to track.

    • 此主题的区域是主要区域。This topic's region will be your primary region.

      “事件网格主题”中的创建主要主题对话框

  4. 创建主题后,导航到该主题,并复制“主题终结点”。Once the Topic has been created, navigate to it and copy the Topic Endpoint. 稍后需要使用该 URI。you'll need the URI later.

    事件网格主要主题

  5. 获取主题的访问密钥,稍后需要用到。Get the access key for the topic, which you'll also need later. 在资源菜单中单击“访问密钥”并复制“密钥 1”。Click on Access keys in the resource menu and copy Key 1.

    获取主要主题密钥

  6. 在“主题”边栏选项卡中,单击“+事件订阅”以创建订阅连接,用于订阅在本教程的先决条件部分所创建的事件接收者网站。In the Topic blade, click +Event Subscription to create a subscription connecting your subscribing the event receiver website you made in the pre-requisites to the tutorial.

    • 为事件订阅提供逻辑名称,并添加“-primary”后缀以方便跟踪。Give the event subscription a logical name and add "-primary" as a suffix to make it easy to track.

    • 选择终结点类型 Web Hook。Select Endpoint Type Web Hook.

    • 将终结点设置为事件接收者的事件 URL,类似于 https://<your-event-reciever>.chinacloudsites.cn/api/updatesSet the endpoint to your event receiver's event URL, which should look something like: https://<your-event-reciever>.chinacloudsites.cn/api/updates

      此屏幕截图显示了“创建事件订阅 - 基本”页,其中突出显示了“名称”、“终结点类型”和“终结点”值。

  7. 重复相同的流程以创建辅助主题和订阅。Repeat the same flow to create your secondary topic and subscription. 这一次,请将“-primary”后缀替换为“-secondary”以方便跟踪。This time, replace the "-primary" suffix with "-secondary" for easier tracking. 最后,请确保将它们放在不同的 Azure 区域。Finally, make sure you put it in a different Azure Region. 尽管可将其放在任何位置,但建议使用 Azure 配对区域While you can put it anywhere you want, it's recommended that you use the Azure Paired Regions. 将辅助主题和订阅放在不同的区域可确保即使主要区域出现故障,也仍可传送新事件。Putting the secondary topic and subscription in a different region ensures that your new events will flow even if the primary region goes down.

现在,应已准备好以下各项:You should now have:

  • 用于测试的事件接收者网站。An event receiver website for testing.
  • 主要区域中的主要主题。A primary topic in your primary region.
  • 用于将主要主题连接到事件接收者网站的主要事件订阅。A primary event subscription connecting your primary topic to the event receiver website.
  • 次要区域中的辅助主题。A secondary topic in your secondary region.
  • 用于将主要主题连接到事件接收者网站的辅助事件订阅。A secondary event subscription connecting your primary topic to the event receiver website.

实现客户端故障转移Implement client-side failover

设置一对区域冗余的主题和订阅后,可以实现客户端故障转移。Now that you have a regionally redundant pair of topics and subscriptions setup, you're ready to implement client-side failover. 可通过多种方式实现故障转移,但所有方式都有一个共同的特征:如果一个主题不再正常,流量将重定向到其他主题。There are several ways to accomplish it, but all failover implementations will have a common feature: if one topic is no longer healthy, traffic will redirect to the other topic.

基本的客户端实现Basic client-side implementation

以下示例代码是一个简单的 .NET 发布者,它始终尝试先发布到主要主题。The following sample code is a simple .NET publisher that will always attempt to publish to your primary topic first. 如果不成功,则故障转移辅助主题。If it doesn't succeed, it will then failover the secondary topic. 在任一情况下,它还会针对 https://<topic-name>.<topic-region>.eventgrid.azure.cn/api/health 执行 GET,以检查另一主题的运行状况 API。In either case, it also checks the health api of the other topic by doing a GET on https://<topic-name>.<topic-region>.eventgrid.azure.cn/api/health. 针对 /api/health 终结点执行 GET 后,正常的主题应该始终以 200 OK 做出响应。A healthy topic should always respond with 200 OK when a GET is made on the /api/health endpoint.

using System;
using System.Net.Http;
using System.Collections.Generic;
using Microsoft.Azure.EventGrid;
using Microsoft.Azure.EventGrid.Models;
using Newtonsoft.Json;

namespace EventGridFailoverPublisher
{
    // This captures the "Data" portion of an EventGridEvent on a custom topic
    class FailoverEventData
    {
        [JsonProperty(PropertyName = "teststatus")]
        public string TestStatus { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // TODO: Enter the endpoint each topic. You can find this topic endpoint value
            // in the "Overview" section in the "Event Grid Topics" blade in Azure Portal..
            string primaryTopic = "https://<primary-topic-name>.<primary-topic-region>.eventgrid.azure.cn/api/events";
            string secondaryTopic = "https://<secondary-topic-name>.<secondary-topic-region>.eventgrid.azure.cn/api/events";

            // TODO: Enter topic key for each topic. You can find this in the "Access Keys" section in the
            // "Event Grid Topics" blade in Azure Portal.
            string primaryTopicKey = "<your-primary-topic-key>";
            string secondaryTopicKey = "<your-secondary-topic-key>";

            string primaryTopicHostname = new Uri( primaryTopic).Host;
            string secondaryTopicHostname = new Uri(secondaryTopic).Host;

            Uri primaryTopicHealthProbe = new Uri("https://" + primaryTopicHostname + "/api/health");
            Uri secondaryTopicHealthProbe = new Uri("https://" + secondaryTopicHostname + "/api/health");

            var httpClient = new HttpClient();

            try
            {
                TopicCredentials topicCredentials = new TopicCredentials(primaryTopicKey);
                EventGridClient client = new EventGridClient(topicCredentials);

                client.PublishEventsAsync(primaryTopicHostname, GetEventsList()).GetAwaiter().GetResult();
                Console.Write("Published events to primary Event Grid topic.");

                HttpResponseMessage health = httpClient.GetAsync(secondaryTopicHealthProbe).Result;
                Console.Write("\n\nSecondary Topic health " + health);
            }
            catch (Microsoft.Rest.Azure.CloudException e)
            {
                TopicCredentials topicCredentials = new TopicCredentials(secondaryTopicKey);
                EventGridClient client = new EventGridClient(topicCredentials);

                client.PublishEventsAsync(secondaryTopicHostname, GetEventsList()).GetAwaiter().GetResult();
                Console.Write("Published events to secondary Event Grid topic. Reason for primary topic failure:\n\n" + e);

                HttpResponseMessage health = httpClient.GetAsync(primaryTopicHealthProbe).Result;
                Console.Write("\n\nPrimary Topic health " + health);
            }

            Console.ReadLine();
        }

        static IList<EventGridEvent> GetEventsList()
        {
            List<EventGridEvent> eventsList = new List<EventGridEvent>();

            for (int i = 0; i < 5; i++)
            {
                eventsList.Add(new EventGridEvent()
                {
                    Id = Guid.NewGuid().ToString(),
                    EventType = "Contoso.Failover.Test",
                    Data = new FailoverEventData()
                    {
                        TestStatus = "success"
                    },
                    EventTime = DateTime.Now,
                    Subject = "test" + i,
                    DataVersion = "2.0"
                });
            }

            return eventsList;
        }
    }
}

试用Try it out

准备好所有组件后,可以测试故障转移实现。Now that you have all of your components in place, you can test out your failover implementation. 在 Visual Studio Code 或偏好的环境中运行上述示例。Run the above sample in Visual Studio code, or your favorite environment. 将以下四个值替换为主题中的终结点和密钥:Replace the following four values with the endpoints and keys from your topics:

  • primaryTopic - 主要主题的终结点。primaryTopic - the endpoint for your primary topic.
  • secondaryTopic - 辅助主题的终结点。secondaryTopic - the endpoint for your secondary topic.
  • primaryTopicKey - 主要主题的密钥。primaryTopicKey - the key for your primary topic.
  • secondaryTopicKey - 辅助主题的密钥。secondaryTopicKey - the key for your secondary topic.

尝试运行事件发布者。Try running the event publisher. 应会看到,测试事件开始进入事件网格查看器,如下所示。You should see your test events land in your Event Grid viewer like below.

事件网格主要事件订阅

为确保故障转移正常进行,可以更改主要主题密钥中的几个字符,使密钥不再有效。To make sure your failover is working, you can change a few characters in your primary topic key to make it no longer valid. 再次尝试运行发布者。Try running the publisher again. 应该仍会看到新事件出现在事件网格查看器中,但在控制台中,会看到这些事件现在是通过辅助主题发布的。You should still see new events appear in your Event Grid viewer, however when you look at your console, you'll see that they are now being published via the secondary topic.

可能的延伸Possible extensions

可根据需要,通过多种方式延伸此示例。There are many ways to extend this sample based on your needs. 对于高流量方案,可能需要定期独立检查主题的运行状况 API。For high-volume scenarios, you may want to regularly check the topic's health api independently. 这样,当某个主题出现故障时,就不需要在每次发布时检查该 API。That way, if a topic were to go down, you don't need to check it with every single publish. 知道某个主题不正常后,可以默认发布到辅助主题。Once you know a topic isn't healthy, you can default to publishing to the secondary topic.

同样,可以根据具体的需要实现故障回复逻辑。Similarly, you may want to implement failback logic based on your specific needs. 如果发布到最靠近的数据中心对于降低延迟而言至关重要,可以定期探测已故障转移的主题的运行状况 API。If publishing to the closest data center is critical for you to reduce latency, you can periodically probe the health api of a topic that has failed over. 该主题恢复正常后,即可放心地故障回复到附近的数据中心。Once it's healthy again, you'll know it's safe to failback to the closer data center.

后续步骤Next steps