Azure 事件网格中的客户端故障转移实现

项目
11/30/2023

灾难恢复通常涉及到创建备份资源，以防止在区域不正常时发生中断。在此过程中，工作负载中需要 Azure 事件网格资源的主要区域和次要区域。

在严重丧失应用程序功能后，可以通过不同的方法予以恢复。本文将描述一个清单，你需要遵循该清单来准备客户端，以便在由于资源或区域不正常而导致故障后能够予以恢复。

事件网格支持在服务器端进行手动和自动异地灾难恢复 (GeoDR)。如果希望更好地控制故障转移过程，仍然可以实现客户端灾难恢复逻辑。有关自动 GeoDR 的详细信息，请参阅 Azure 事件网格中的服务器端异地灾难恢复。

下表说明了事件网格中的客户端故障转移和异地灾难恢复支持。

事件网格资源	客户端故障转移支持	异地灾难恢复 (GeoDR) 支持
自定义主题	支持	跨地理位置/区域
系统主题	不支持	自动启用
域	支持	跨地理位置/区域
合作伙伴命名空间	支持	不支持
命名空间	支持	不支持

客户端故障转移注意事项

创建并配置主要事件网格资源。
创建并配置次要事件网格资源。
请记住，必须为这两个资源启用相同的配置、子资源和功能。
事件网格资源必须托管在不同的区域。
如果事件网格资源具有依赖的资源（例如用于死信队列的存储资源），则应使用次要事件网格资源中所用的同一区域。
确保定期测试终结点，以保证恢复计划资源部署到位且正常运行。

自定义主题的基本客户端故障转移实现示例

以下示例代码是一个简单的 .NET 发布者，它会始终尝试先发布到主要主题。如果不成功，则会对辅助主题进行故障转移。在任一情况下，它还会针对 https://<topic-name>.<topic-region>.eventgrid.azure.cn/api/health 执行 GET，以检查另一主题的运行状况 API。针对 /api/health 终结点执行 GET 后，正常的主题应该始终以 200 OK 做出响应。

注意

下面的示例代码仅用于演示，不用于生产。

using System;
using System.Net.Http;
using System.Collections.Generic;
using System.Threading.Tasks;
using Azure;
using Azure.Messaging.EventGrid;

namespace EventGridFailoverPublisher
{
    // This captures the "Data" portion of an EventGridEvent on a custom topic
    class FailoverEventData
    {
        public string TestStatus { get; set; }
    }

    class Program
    {
        static async Task Main(string[] args)
        {
            // TODO: Enter the endpoint each topic. You can find this topic endpoint value
            // in the "Overview" section in the "Event Grid topics" page in Azure Portal..
            string primaryTopic = "https://<primary-topic-name>.<primary-topic-region>.eventgrid.azure.cn/api/events";
            string secondaryTopic = "https://<secondary-topic-name>.<secondary-topic-region>.eventgrid.azure.cn/api/events";

            // TODO: Enter topic key for each topic. You can find this in the "Access Keys" section in the
            // "Event Grid topics" page in Azure Portal.
            string primaryTopicKey = "<your-primary-topic-key>";
            string secondaryTopicKey = "<your-secondary-topic-key>";

            Uri primaryTopicUri = new Uri(primaryTopic);
            Uri secondaryTopicUri = new Uri(secondaryTopic);

            Uri primaryTopicHealthProbe = new Uri($"https://{primaryTopicUri.Host}/api/health");
            Uri secondaryTopicHealthProbe = new Uri($"https://{secondaryTopicUri.Host}/api/health");

            var httpClient = new HttpClient();

            try
            {
                var client = new EventGridPublisherClient(primaryTopicUri, new AzureKeyCredential(primaryTopicKey));

                await client.SendEventsAsync(GetEventsList());
                Console.Write("Published events to primary Event Grid topic.");

                HttpResponseMessage health = httpClient.GetAsync(secondaryTopicHealthProbe).Result;
                Console.Write("\n\nSecondary Topic health " + health);
            }
            catch (RequestFailedException ex)
            {
                var client = new EventGridPublisherClient(secondaryTopicUri, new AzureKeyCredential(secondaryTopicKey));

                await client.SendEventsAsync(GetEventsList());
                Console.Write("Published events to secondary Event Grid topic. Reason for primary topic failure:\n\n" + ex);

                HttpResponseMessage health = await httpClient.GetAsync(primaryTopicHealthProbe);
                Console.WriteLine($"Primary Topic health {health}");
            }

            Console.ReadLine();
        }

        static IList<EventGridEvent> GetEventsList()
        {
            List<EventGridEvent> eventsList = new List<EventGridEvent>();

            for (int i = 0; i < 5; i++)
            {
                eventsList.Add(new EventGridEvent(
                    subject: "test" + i,
                    eventType: "Contoso.Failover.Test",
                    dataVersion: "2.0",
                    data: new FailoverEventData
                    {
                        TestStatus = "success"
                    }));
            }

            return eventsList;
        }
    }
}

试用

准备好所有组件后，可以测试故障转移实现。

为确保故障转移正常进行，可以更改主要主题密钥中的几个字符，使密钥不再有效。再次尝试运行发布者。以下示例事件将继续流经事件网格，但是，当你查看客户端时，会看到现在正在通过次要主题发布这些事件。

可能的延伸

可根据需要，通过多种方式延伸此示例。对于高流量方案，可能需要定期独立检查主题的运行状况 API。这样，当某个主题出现故障时，就不需要在每次发布时检查该 API。知道某个主题不正常后，可以默认发布到辅助主题。

同样，可以根据具体的需要实现故障回复逻辑。如果发布到最靠近的数据中心对于降低延迟而言至关重要，可以定期探测已故障转移的主题的运行状况 API。该主题再次正常运行后，可以放心地故障恢复到附近的数据中心。

Azure 事件网格中的客户端故障转移实现

客户端故障转移注意事项

自定义主题的基本客户端故障转移实现示例

试用

可能的延伸

后续步骤

其他资源