Reliability guides for Azure services

This article provides links to reliability guidance for many Azure services. Most reliability guides contain the following information:

  • The reliability architecture overview provides a synopsis about how a service supports reliability. It includes information about which components Azure manages and which components you manage, built-in redundancy features, and how to provision and manage multiple resources, if applicable.

  • Transient fault handling describes how a service handles day-to-day transient faults that can occur in the cloud. It also describes how to handle these faults in an application, including information about retry policies, timeouts, and other best practices.

  • Availability zones describe zonal and zone-redundant deployment options, traffic routing and data replication between zones, zone-outage scenarios, failback processes, and how to configure resources for availability zone support.

  • Multi-region support describes how to configure multi-region or geo-disaster support, traffic routing and data replication between regions, region-down scenarios, failover and failback support, and alternative multi-region support.

Some guides also contain information about the following capabilities:

  • Backup support explains who controls backups, where they're stored and replicated, how to recover them, and whether they can be accessed only within a region or across regions.

  • Service-level agreements (SLAs), which define and describe the expected uptime, and how the expected uptime changes based on the configuration that you use.

Reliability guides by service

The following table provides links to reliability guidance for Azure services. Each guide contains information about how the service supports reliability features.

Note

Some documents don't follow a single reliability guide format. These services might list more than one article that references reliability guidance.

Service Reliability guide Other reliability documentation
Azure AI Health Insights Reliability in AI Health Insights
Azure AI Search Reliability in AI Search
Azure API Management Reliability in API Management
Azure App Configuration App Configuration and high data availability

Resiliency and disaster recovery
Azure App Service Reliability in App Service
App Service Environment Reliability in App Service Environment
Azure Application Gateway v2 Autoscaling and high availability
Azure Backup Reliability in Backup
Azure Batch Reliability in Batch
Azure Blob Storage Reliability in Blob Storage
Azure Cache for Redis Enable zone redundancy for Azure Cache for Redis

Configure passive geo-replication for Premium Azure Cache for Redis instances
Azure Container Apps Reliability in Container Apps
Azure Container Registry Reliability in Container Registry
Azure Cosmos DB for NoSQL Reliability in Azure Cosmos DB for NoSQL
Azure Data Box Recover data if an entire region fails
Azure Data Explorer Business continuity and disaster recovery overview
Azure Data Factory Reliability in Data Factory
Azure Database for MySQL High availability concepts in Azure Database for MySQL Flexible Server
Azure Database for MySQL Flexible Server High availability concepts in Azure Database for MySQL Flexible Server

Point-in-time restore in Azure Database for MySQL
Azure DDoS Protection Reliability in DDoS Protection
Azure DevOps Data protection overview
Azure Disk Encryption Redundancy options for managed disks
Azure DNS Reliability in Azure DNS
Azure ExpressRoute Reliability in Azure ExpressRoute
Azure ExpressRoute Reliability in Azure ExpressRoute
Azure Files Reliability in Azure Files
Azure Firewall Reliability in Azure Firewall
Azure Functions Reliability in Azure Functions
Azure Key Vault Reliability in Key Vault
Azure Kubernetes Service (AKS) Reliability in AKS
Azure Load Balancer Reliability in Load Balancer
Azure Logic Apps Reliability in Logic Apps
Azure Machine Learning Failover for business continuity and disaster recovery
Azure managed disks Best practices for achieving high availability by using Azure virtual machines and managed disks
Azure Media Services High availability by using Media Services and video on demand (VOD)
Azure Migrate Azure Migrate and backup and disaster recovery
Azure Monitor Logs Enhance data and service resilience in Azure Monitor Logs by using availability zones

Azure Monitor Logs workspace replication
Azure Network Watcher Network Watcher service availability and redundancy
Azure Private Link Private Link availability
Azure public IP addresses Azure public IP addresses availability zone
Azure Queue Storage Reliability in Queue Storage
Azure Route Server Route Server frequently asked questions (FAQs)
Azure Service Bus Best practices for insulating applications against Service Bus outages and disasters
Azure Service Fabric Deploy a Service Fabric cluster across availability zones

Disaster recovery in Service Fabric
Azure SignalR Service Resiliency and disaster recovery in Azure SignalR Service
Azure Site Recovery Set up disaster recovery for Azure virtual machines
Azure SQL Database Reliability in Azure SQL Database
Azure SQL Managed Instance Reliability in Azure SQL Managed Instance
Azure Stream Analytics Achieve geo-redundancy for Stream Analytics jobs
Azure Table Storage Reliability in Table Storage
Azure Traffic Manager Reliability in Traffic Manager
Azure Virtual Machines Reliability in Virtual Machines
Azure VM Image Builder Reliability in VM Image Builder
Azure Virtual Machine Scale Sets Reliability in Virtual Machine Scale Sets
Azure Virtual Network Reliability in Virtual Network
Azure Virtual WAN Availability zones and resiliency in Virtual WAN

Disaster recovery design
Azure VPN Gateway About zone-redundant virtual network gateway in Azure availability zones

Highly Available cross-premises and virtual network-to-virtual network connectivity
Azure Web Application Firewall Deploy Azure Firewall with availability zones by using Azure PowerShell

Achieve a disaster recovery scenario across datacenters by using Application Gateway
Microsoft Purview Reliability in Microsoft Purview