Reliability in Azure Event Hubs
This article describes reliability support in Azure Event Hubs, and covers both intra-regional resiliency with availability zones and cross-region disaster recovery and business continuity. For a more detailed overview of reliability principles in Azure, see Azure reliability.
Availability zone support
Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.
For more information on availability zones in Azure, see What are availability zones?.
Event Hubs implements transparent failure detection and failover mechanisms so that, when failure occurs, the service continues to operate within the assured service-levels and without noticeable interruptions. If you create an Event Hubs namespace in a region that supports availability zones, zone redundancy is automatically enabled. With zone-redundancy, fault tolerance is increased and the service has enough capacity reserves to cope with the outage of an entire facility. Both metadata and data (events) are replicated across data centers in each zone.
Prerequisites
Availability zone support is only available in Azure regions with availability zones.
Create a resource with availability zones enabled
When you use the Azure portal, zone redundancy is automatically enabled. When you create a namespace, you see the following highlighted message when you select a region that supports availability zones.
Disable availability zones
The Azure portal doesn't support disabling availability zones. To disable availability zones, use one of the following methods:
Azure CLI command
az eventhubs namespace
with--zone-redundant=false
PowerShell command
New-AzEventHubNamespace
with-ZoneRedundant=false
to create a namespace with zone redundancy disabled.
Availability zone migration
When you create availability zones in a region that supports them, availability zones are automatically enabled.
Cross-region disaster recovery and business continuity
Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.
When it comes to DR, Azure uses the shared responsibility model. In a shared responsibility model, Azure ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you are responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.
The all-active Azure Event Hubs cluster model with availability zone support provides resiliency against hardware and datacenter outages. However, if a disaster where an entire region and all zones are unavailable, you can use Geo-disaster recovery to recover your workload and application configuration.
There is one features that provide geo-disaster recovery in Azure Event Hubs.
Geo-disaster recovery (Metadata DR), which just provides replication of only metadata.
Geo-Disaster recovery ensures that the entire configuration of a namespace (Event Hubs, Consumer Groups, and settings) is continuously replicated from a primary namespace to a secondary namespace when paired.
The Geo-disaster recovery feature of Azure Event Hubs is a disaster recovery solution. The concepts and workflow described in this article apply to disaster scenarios, and not to temporary outages. For a detailed discussion of disaster recovery in Azure, see this article.
With Geo-Disaster recovery, you can initiate a once-only failover move from the primary to the secondary at any time. The failover move points the chosen alias name for the namespace to the secondary namespace. After the move, the pairing is then removed. The failover is nearly instantaneous once initiated.
For detailed information, samples, and further documentation, on Geo-Disaster recovery in Event Hubs, see Azure Event Hubs - Geo-disaster recovery.