Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In Azure, a zonal resource is a resource that is pinned to a single zone. Because a zonal resource is in a single availability zone, it isn't zone resilient. If the zone containing the resource has a problem, then the resource is likely to experience downtime.
Some Azure services either require or allow you to deploy zonal resources. You may choose to deploy a resource zonally due to latency considerations or specific service requirements. You might even choose to pin groups of resources to a single zone.
This article describes scenarios where you might choose to deploy zonal (single-zone) resources instead of zone-redundant resources, as well as considerations and responsibilities that you must take to make your solution resilient to zone outages.
Resource deployment types
In Azure, only some deployment types provide zone resiliency. The following table shows three resource deployment types, whether they support zone resiliency, zone distribution, how to configure them, and recommendations for their use:
| Resource deployment type | Zone resiliency support | Zone distribution | How to configure | Recommendation |
|---|---|---|---|---|
| Zone-redundant | Always zone-resilient | Zone-redundant resources are spread across multiple zones and are resilient to zone failures. If a failure occurs in one zone, the service can continue operation in other zones. | Some zone-redundant resources provide automatic zone redundancy across availability zones, while others require you to manually enable zone redundancy. Check your service's reliability guidance to see what your service requires to enable resiliency | Always use zone-redundant resources wherever possible, especially in production deployments. |
| Zonal | Not automatic. It's your responsibility to enable zone resiliency if you choose to. Zonal resources are isolated from faults in other zones, but a failure of their own zone can cause downtime. |
You select the zone for the resource. | If you have multiple resources that need to be zone-aligned (placed in the same zone), you need to configure the same zone on each resource. | You should only use zonal resources when there's a compelling need to do so. To make your solution zone-resilient, it's your responsibility to design and implement a multi-zone solution. |
| Nonzonal (regional) | None | If the region offers availability zone support, Azure might use any zone in the region. | There is no zone configuration offered for nonzonal resources. | Because nonzonal resources cannot be made zone resilient, avoid nonzonal deployments for all production workloads in regions with availability zones. |
For more information one the three types of resource deployments and availability zones in general, see What are availability zones?
Workloads that combine zone-redundant and zonal resources
Many workloads combine both zonal and zone-redundant resources. For example, your workload might include a set of zonal VMs for your database tier, a zone-redundant web server hosted on Azure App Service, and a zone-redundant load balancer to send traffic to your database VMs:
When you combine zonal and zone-redundant resources into a workload, it's important that you consider how each resource, and your whole solution, behaves if any availability zone has a problem. Typically, zone-redundant services automatically recover from zone outages with minimal or no data loss, and Azure manages the entire process. For zonal resources, you're responsible for configuring automated failover or performing any manual recovery activities. To find out how each service behaves during zone-down scenarios, to understand your responsibilities and Azure's responsibilities, and to learn how to monitor the health of services for zone-down events, check your service's reliability guide.
When to use a zonal deployment
You should only use zonal resources when there's a compelling need to do so. Common reasons for a single-zone deployment are when you deploy a resource that must be zonal, when your specific service is only available in a particular zone, or when your workload is unusually sensitive to inter-zone latency.
Important
Some Azure services give you an option for zonal or zone-redundant deployments. If you don't have a strong reason to use a zonal deployment, you should use a zone-redundant deployment.
Resources that require zonal deployments
A small number of Azure services only support zonal deployments, and don't provide zone-redundant deployments.
Virtual machines are an example of a zonal resource. However, you can use virtual machine scale sets to create sets of virtual machines. Scale sets can be made zone-redundant, which means the VMs in the set are spread across multiple zones. Scale sets are a good way to achieve zone resiliency for many VM-based workloads.
Tip
If you deploy multiple VMs that perform similar functions, we recommend you use zone-redundant scale sets instead of single-instance VMs that you deploy individually.
Some services provide options that are available only in specific zones. For example, certain VM types that use advanced GPUs might be available only in particular zones within a region, which means that they can't be deployed across multiple zones. To check which regions and zones support the VM types you need, use these resources:
- Products available by region to check the VM types available in each region.
- Check VM SKU availability to check the supported VM types and sizes within each zone of a specific region.
If the VM type you need is only available in a single zone within the region you use, you might need to consider a zonal deployment for that VM, and then find other ways to make the VM resilient to zone outages. However, you should still ensure the other parts of your solution are zone-resilient.
To see which services require or support zonal deployments, see Azure services with availability zone support.
Inter-zone latency
If you have an extremely latency-sensitive workload, you might choose to use zonal resources instead of zone-redundant resources, even if a service supports zone-redundant deployments.
Availability zones are connected by a low-latency network, and inter-zone round-trip latency is typically less than 2 milliseconds. For most workloads, inter-zone latency isn't a concern, and the resiliency benefits of spreading resources across availability zones significantly outweigh the negligible performance effect of sending traffic between zones. However, there are a small number of workloads that are unusually sensitive to inter-zone latency. These workloads might include:
Legacy on-premises applications. Some legacy workloads can contain applications that were originally designed for an on-premises environment. These workloads assume that components (other applications, databases, and other services) are colocated on the same host or are physically adjacent to each other.
Very high-scale synchronous replication. Occasionally, stateful applications and databases perform very large numbers of writes with synchronous replication. Synchronous replication means that data is written to multiple replicas before the write operation is considered complete. Distributing replicas across availability zones improves resiliency, but when you use synchronous replication, inter-zone latency can increase the write latency of the workload. Most of the time the increased latency isn't consequential, but because of how some applications are designed, the extra latency can sometimes become problematic at high scale.
Important
It's unusual for workloads to be sensitive to inter-zone latency. Don't assume your workload is affected unless you test the latency for your specific workload and needs.
If you suspect that your workload might be affected by inter-zone latency, you should follow these steps to test the impact for your specific workload in a realistic test environment:
Define acceptable performance requirements. Inter-zone traffic adds a small amount of latency, but it's usually not noticeable or impactful for most workloads. Define what acceptable performance looks like for your workload.
Run a performance test within a single availability zone. Establish a set of baseline performance metrics.
Important
Test your actual workload, including your applications, protocols, configuration, and Azure region. Use realistic load. It's not enough to review benchmarks or perform synthetic tests, because they don't show how your particular solution behaves.
Enable inter-zone replication. Depending on the components you use, you might enable zone redundancy, or you might move replicas between zones.
Re-run performance tests. Collect the same metrics you collected earlier.
Compare the performance impact against your requirements. Use your requirements and the performance data to make an informed decision about the tradeoff between latency and resiliency to zone outages.
If the test demonstrates that the latency is unacceptably high for your workload, consider taking the following actions:
Try using another set of zones. There can be slight variability in the latency between different zones because zones can have different physical distances from each other.
Tip
If you're testing across Azure subscriptions, review the logical to physical zone mapping to ensure you test the sets of physical zones you expect.
If there's another Azure region that meets your overall needs for data residency and other factors, try using multiple zones in that region.
Consider whether you can redesign your application to reduce the amount of inter-zone communication required. For example, you might be able to consolidate multiple small database operations together into one operation. This approach can reduce the impact that the latency has on your workload.
If none of these actions help, you might choose to run the specific workload or components within a single availability zone by using zonal VMs and other supported Azure services. You then take responsibility for making the zonal components resilient to zone outages. Review the rest of this article to understand your responsibilities and some approaches to consider.
Your responsibilities for a zonal deployment
A zonal resource is at risk of downtime whenever its availability zone experiences an outage. When you deploy a zonal resource, you're responsible for making your workload resilient to zone-level failures.
Important
Zonal resources are not inherently resilient to zone failures. You must design ways to mitigate the risk of a zone failure by developing a comprehensive plan that includes zone-down scenarios.
When you deploy zonal resources, to make them zone-resilient you need to consider the following responsibilities:
Deployment and configuration of multiple resources: You should deploy separate zonal resources manually into different zones or regions. Determine how to keep configuration consistent across each resource. It's a good practice to use infrastructure as code (IaC) approaches, which help you to quickly deploy multiple identical resources.
Traffic routing and distribution. You need to select a load balancer component, making sure it's zone-resilient itself, and configure it to send traffic between the resources in different zones. You typically configure the routing policy (for example, active-active or active-passive), automated health checks, and failover processes. For more information about the load balancing services available in Azure, see Load balancing options.
Replication or data backup. For any stateful resources, it's your responsibility to protect the data they store and ensure that it's safely stored in multiple zones. Commonly, you configure replication to another service instance in another availability zone. In some situations you might rely on backups instead. However, backups require a longer recovery time during a zone failure, which requires you to have a higher recovery time objective (RTO). They also result in more data loss, which requires a higher recovery point objective (RPO).
Zone failure detection and response process implementation. You need to determine how to monitor the health of zonal resources, decide when to consider them unhealthy, and then trigger response actions like restoring operations in another zone or region.
Zone recovery processes. After the zone recovers, you're responsible for performing any recovery actions required, such as failing back to resources in the primary zone.
Common approaches for zonal deployment resiliency
To make informed decisions about the best way to achieve zone resiliency with your zonal resources, consider the following factors:
Review your whole workload. Understand how each component behaves in zone-down events, including zone-redundant and zonal resources, and any nonregional resources. Use the reliability guide for each service to understand the details of how each service works during zone-down scenarios, and how to monitor the health of services for zone-down events.
Understand the allowable data loss during a zone failure. Your RPO specifies how much data loss you're prepared to accept.
Many of Azure's zone-redundant resources provide an RPO of zero for zone failures, which means no data loss occurs. They typically achieve this RPO by synchronously replicating all changes across zones.
When you plan a zonal deployment, you need to ensure that you can meet your workload's RPO requirements when a zone fails.
Understand the allowable downtime during a zone failure. Your RTO specifies how much downtime you're prepared to accept.
Azure's zone-redundant resources typically provide a very low RTO for zone failures, and usually require just a few seconds of downtime.
When you plan a zonal deployment, you need to ensure that you can meet your workload's RTO requirements. If you have a low RTO, you might need to rely on automated detection and recovery processes. A higher RTO provides more flexibility for your response processes.
Understand the cost. Zonal resources are typically billed individually, so deploying multiple zonal resources can increase your resource cost.
Design a zonal deployment for resiliency
When you design your zonal deployment for resiliency, you need to consider whether you're using availability zones to achieve high availability or disaster recovery. The distinction between these two concepts is based on your RTO and RPO requirements.
If you have a low RTO and low RPO requirement, you need to treat availability zones as a high availability construct. However, if your RTO and RPO are higher, you might choose to treat availability zones as a disaster recovery construct. For more information about these concepts, see What are business continuity, high availability, and disaster recovery?. Your workload tier can help you to determine your requirements and necessary actions.
Design for high availability
Consider deploying your own highly available architecture across multiple zones. A highly available architecture requires automated and frequent data replication across components that are deployed across multiple zones, and automatic failover between those components if a zone failure occurs.
Some applications that you deploy on zonal VMs provide built-in high availability support, such as by being replica-aware. For example, if you use SQL Server on Azure VMs, availability groups provide traffic routing and failover capabilities. You can select whether you want to use synchronous or asynchronous replication. For more information, see Business continuity and HADR for SQL Server on Azure VMs.
Design for disaster recovery
Disaster recovery is distinct from high availability because, in a disaster scenario, you can tolerate more downtime and data loss. RTO and RPO are usually measured in hours or longer.
A disaster recovery plan helps you to plan the different situations you might encounter and how you expect to respond to them using a combination of automated and manual processes.
The following disaster recovery approaches can help when you're planning a zonal deployment:
Azure Site Recovery zone-to-zone disaster recovery: This approach is useful when you need disk-level asynchronous replication between VMs in different zones. For more information, see Enable Azure VM disaster recovery between availability zones.
Azure Site Recovery region-to-region disaster recovery: Azure Site Recovery supports region-to-region disaster recovery, and relies on asynchronous replication. This approach enables you to fail over to a zone in another Azure region instead of another zone in your primary region. For more information, see Replicate Azure VMs to another Azure region.
Backup-based disaster recovery: If your solution can tolerate a high RTO and high RPO, you can consider using backups as a disaster recovery strategy. If the zone experiences an outage, you can then restore backups into another zone or region. You also need to consider whether you precreate the other Azure resources in your solution, or if you create them during the failover process.
In a zonal architecture, you're often responsible for storing and replicating those backups.
Azure Backup is a commonly used service that provides managed backup. It supports zone-redundant backups, and geo-replicated backups in paired Azure regions. For more information, see What is the Azure Backup service?. Some applications provide built-in backup capabilities. For example, SQL Server on Azure VMs provides a range of backup capabilities.
For more information about other approaches you can consider, see Recommendations for using availability zones and regions in the Azure Well-Architected Framework.