Reliability guides by service

This article provides links to reliability guidance for many Azure services. Most reliability guides contain the following information:

  • Reliability architecture overview is a synopsis of how the service supports reliability. It includes information about which components Azure manages and which components you manage, built-in redundancy features, and how to provision and manage multiple resources, if applicable.

  • Transient fault handling describes how the service handles day-to-day transient faults that can occur in the cloud. It also describes how to handle these faults in your application, including information about retry policies, timeouts, and other best practices.

  • Availability zones describe zonal and zone-redundant deployment options, traffic routing and data replication between zones, what happens when a zone experiences an outage, failback, and how to configure your resources for availability zone support.

  • Multi-region support describes how to configure multi-region or geo-disaster support, traffic routing and data replication between regions, region-down experience, failover and failback support, and alternative multi-region support.

Some guides also contain information about:

  • Backup support, such as who controls backups, where they're stored and replicated to, how they can be recovered, and whether they're accessible only within a region or across regions.

  • Service-level agreements (SLAs) for availability, including how the expected uptime changes based on the configuration that you use.

Reliability guides by service

This section provides links to reliability guidance for many Azure services. Each service guide contains information about how the service supports reliability features.

Note

Some service documents don't follow a single reliability guide format. These services might list more than one article that references reliability guidance.

Product Reliability guide Other reliability documentation
Azure AI Health Insights Reliability in AI Health Insights
Azure AI Search Reliability in AI Search
Azure API Management Reliability in API Management
Azure App Configuration How does App Configuration ensure high data availability?

Resiliency and disaster recovery
Azure App Service Reliability in App Service
App Service Environment Reliability in App Service Environment
Azure Application Gateway v2 Autoscaling and high availability
Azure Backup Reliability in Backup
Azure Batch Reliability in Batch
Azure Blob Storage Reliability in Blob Storage
Azure Cache for Redis Enable zone redundancy for Azure Cache for Redis

Configure passive geo-replication for Premium Azure Cache for Redis instances
Azure Container Apps Reliability in Container Apps
Azure Container Registry Reliability in Container Registry
Azure Cosmos DB for NoSQL Reliability in Azure Cosmos DB for NoSQL
Azure Data Box How can I recover my data if an entire region fails?
Azure Data Explorer Business continuity and disaster recovery overview
Azure Database for MySQL High availability concepts in Azure Database for MySQL flexible server
Azure Database for MySQL flexible server High availability concepts in Azure Database for MySQL flexible server

Point-in-time restore in Azure Database for MySQL
Azure Database for PostgreSQL flexible server Reliability in Azure Database for PostgreSQL flexible server
Azure DDoS Protection Reliability in DDoS Protection
Azure DevOps Data protection overview
Azure Disk Encryption Redundancy options for managed disks
Azure DNS Reliability in Azure DNS
Azure ExpressRoute Design for high availability with ExpressRoute

Design for disaster recovery with ExpressRoute private peering
Azure Files Choose the right redundancy option

Disaster recovery and failover for Azure Files
Azure Firewall Reliability in Azure Firewall
Azure Functions Reliability in Azure Functions
Azure Key Vault Reliability in Key Vault
Azure Kubernetes Service (AKS) Reliability in AKS
Azure Load Balancer Reliability in Load Balancer
Azure Logic Apps Reliability in Logic Apps
Azure Machine Learning Failover for business continuity and disaster recovery
Azure managed disks Best practices for achieving high availability with Azure virtual machines and managed disks
Azure Media Services High availability by using Media Services and video on demand (VOD)
Azure Migrate Does Azure Migrate offer backup and disaster recovery?
Azure Monitor Logs Enhance data and service resilience in Azure Monitor Logs by using availability zones
Azure Network Watcher Network Watcher service availability and redundancy
Azure Private Link Private Link availability
Azure public IP addresses Azure public IP addresses availability zone
Azure Queue Storage Reliability in Queue Storage
Azure Route Server Route Server frequently asked questions (FAQ)
Azure Service Bus Best practices for insulating applications against Service Bus outages and disasters
Azure Service Fabric Deploy a Service Fabric cluster across availability zones

Disaster recovery in Service Fabric
Azure SignalR Service Resiliency and disaster recovery in Azure SignalR Service
Azure Site Recovery Set up disaster recovery for Azure virtual machines
Azure SQL Database Azure SQL Database - High availability

Disaster recovery guidance - Azure SQL Database
Azure SQL Managed Instance Failover groups overview and best practices - Azure SQL Managed Instance
Azure Stream Analytics Achieve geo-redundancy for Stream Analytics jobs
Azure Table Storage Reliability in Azure Table Storage
Azure Traffic Manager Reliability in Traffic Manager
Azure Virtual Machines Reliability in Virtual Machines
Azure VM Image Builder Reliability in VM Image Builder
Azure Virtual Machine Scale Sets Reliability in Virtual Machine Scale Sets
Azure Virtual Network Reliability in Virtual Network
Azure Virtual WAN How are availability zones and resiliency handled in Virtual WAN?

Disaster recovery design
Azure VPN Gateway About zone-redundant virtual network gateway in Azure availability zones

Highly Available cross-premises and virtual network-to-virtual network connectivity
Azure Web Application Firewall Deploy Azure Firewall with availability zones by using Azure PowerShell

How do I achieve a disaster recovery scenario across datacenters by using Application Gateway?
Microsoft Purview Reliability in Microsoft Purview