Reliability in Azure App Service

Azure App Service is an HTTP-based service for hosting web applications, REST APIs, and mobile back ends. App Service integrates with Azure to provide security, load balancing, autoscaling, and automated management for applications. As an Azure service, App Service provides a range of capabilities to support your reliability requirements.

When you use Azure, reliability is a shared responsibility. Azure provides a range of capabilities to support resiliency and recovery. You're responsible for understanding how those capabilities work within all of the services you use, and selecting the capabilities you need to meet your business objectives and uptime goals.

This article describes how to make App Service resilient to a variety of potential outages and problems, including transient faults, availability zone outages, region outages, and service maintenance. It also describes how you can use backups to recover from other types of problems, and highlights some key information about the App Service service level agreement (SLA).

Note

If you are looking for information about reliability support in App Service Environment, see Reliability in App Service Environment.

Production deployment recommendations

The Azure Well-Architected Framework provides recommendations across reliability, performance, security, cost, and operations. To understand how these areas influence each other and contribute to a reliable App Service solution, see Architecture best practices for App Service (Web Apps) in the Azure Well-Architected Framework.

Reliability architecture overview

When you create an App Service web app, you specify the App Service plan that runs the app.

An App Service plan defines a set of compute resources that run your web apps. All web apps must run inside a plan. You can scale a plan to run on multiple VM instances, also called workers. These instances provide the compute resources that run your app code. A single App Service plan can host multiple apps. All apps run on the same shared set of VM instances.

App Service provides the following redundancy features:

Distribution across fault domains: At the platform level, Azure automatically distributes your App Service plan's VM instances across fault domains within the Azure region. This distribution minimizes the risk of localized hardware failures by grouping VMs that share a common power source and network switch.
Distribution across availability zones: If you enable zone redundancy on a supported App Service plan, Azure distributes your instances across availability zones within the region. This configuration provides higher resiliency if a zone outage occurs. For more information about zone redundancy, see Availability zone support.
App scaling: When you configure your App Service plan to run multiple VM instances, all apps in the plan run on all instances by default. If you configure your plan for autoscaling, all apps scale out together based on the autoscale settings. However, you can customize how many plan instances run a specific app by using per-app scaling.
Scale units: Internally, App Service runs on a platform infrastructure called scale units, also known as stamps or webspaces. A scale unit includes all components needed to host and run App Service, including compute, storage, networking, and load balancing. Azure manages scale units to ensure balanced workload distribution, perform routine maintenance, and maintain overall platform reliability.

Some capabilities might only be applied to specific scale units. For example, some App Service scale units might support zone redundancy, while other scale units in the same region don't.

Resilience to transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.

Microsoft-provided SDKs usually handle transient faults. Because you host your own applications on App Service, take steps to reduce the chance of transient faults:

Deploy multiple instances in your plan. App Service performs automated updates and other forms of maintenance on instances in your plan. If an instance becomes unhealthy, the service can automatically replace that instance with a new healthy instance. During the replacement process, there can be a short period when the previous instance is unavailable and a new instance isn't ready to serve traffic. To mitigate these effects, deploy multiple instances of your App Service plan.
Use deployment slots. App Service deployment slots enable zero-downtime deployments of your applications. Use deployment slots to minimize the effect of deployments and configuration changes for your users. Deployment slots also reduce the likelihood that your application restarts. Restarting the application causes a transient fault.
Avoid scaling up or scaling down. These operations change the CPU, memory, and other resources assigned to each instance, and they can trigger an application restart. Instead, select a tier and instance size that meet your performance requirements under typical load. To scale out and scale in, dynamically add and remove instances to handle changes in traffic volume.

Resilience to availability zone failures

Availability zones are physically separate groups of datacenters within an Azure region. When one zone fails, services can fail over to one of the remaining zones.

For Premium v2 to v3 tiers, you can configure App Service as zone redundant, which means that your resources are distributed across multiple availability zones. Distribution across multiple zones helps your production workloads achieve resiliency and reliability. When you configure zone redundancy on App Service plans, all apps that use the plan become zone redundant.

Requirements

To enable zone redundancy, you must meet the following requirements:

Region support: For App Service Premium v2 and v3 plans, zone redundancy is supported in any region that supports availability zones.
Plan type: Use Premium v2 to v3 plan types.
Minimum number of instances: Deploy a minimum of two instances in your plan.
Scale unit: Your app must be deployed to a scale unit that supports availability zones. You don't directly control the scale unit that your plan uses. Instead, when you create an App Service plan, the plan is assigned to a scale unit based on the plan's resource group. To determine whether the scale unit for your App Service plan supports zone redundancy, see Check for zone redundancy support for an App Service plan.

If your App Service plan is on a scale unit that doesn't support zone redundancy, you can't enable zone redundancy on your plan. Instead, you need to redeploy your apps to a new plan on a different scale unit.

Instance distribution across zones

When you create a zone-redundant App Service plan, Azure distributes the plan's instances across availability zones in the region. This distribution ensures that your apps remain available even if one zone experiences an outage.

Instance distribution in a zone-redundant deployment follows specific rules. These rules also apply as the app scales in and out:

Minimum instances: Your App Service plan must have a minimum of two instances for zone redundancy.
Maximum availability zones supported by your plan: Azure determines the number of availability zones that your plan can use, which is referred to as maximumNumberOfZones. To view the number of availability zones that your specific plan can use, see Check zone redundancy support for an App Service plan.

Note

The number of availability zones available to your plan (maximumNumberOfZones) varies by scale unit and region. A zone-redundant deployment always uses at least two availability zones, and might use more depending on your scale unit. Regardless of the number of zones, a zone-redundant deployment provides resilience to a single zone failure and offers the same SLA.
Instance distribution: When zone redundancy is enabled, Azure distributes plan instances across multiple availability zones automatically. The distribution is based on the following rules:
- If the number of instances exceeds maximumNumberOfZones and divides evenly, Azure distributes the instances evenly across zones.
- If the number of instances doesn't divide evenly, Azure distributes the remaining instances across the remaining zones.
- When the App Service platform allocates instances for a zone-redundant App Service plan, it uses best-effort zone balancing that the underlying Azure virtual machine scale sets provide. A plan is balanced if each zone has the same number of VMs or differs by one instance from all other zones. For more information, see Zone balancing.
Physical zone placement: You can view the physical availability zone used for each of your App Service plan instances. For more information, see View physical zones for an App Service plan.

Considerations

For Premium v2 to v3 plans, an availability zone outage might affect some aspects of Azure App Service, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

When you enable zone redundancy on your App Service Premium v2 to v3 plan, you also improve resiliency during platform updates. For more information, see Reliability during service maintenance.

For App Service plans that aren't configured as zone redundant, the underlying virtual machine (VM) instances aren't resilient to availability zone failures. They can experience downtime during an outage in any zone in that region.

Cost

When you use App Service Premium v2 to v3 plans, enabling availability zones doesn't add cost if you have two or more instances. Charges are based on your App Service plan SKU, the capacity that you specify, and any instances that you scale to based on your autoscale criteria.

If you enable availability zones but specify a capacity of less than two, the platform enforces a minimum instance count of two. The platform charges you for those two instances.

Configure availability zone support

Create a new zone-redundant App Service plan. For more information, see Create a new App Service plan that includes zone redundancy.
Enable or disable zone redundancy on an existing App Service plan. For more information, see Set zone redundancy for an existing App Service plan.

Capacity planning and management

To prepare for availability zone failure, consider over-provisioning the capacity of your App Service plan. This approach allows the solution to tolerate some capacity loss and continue to function without degraded performance. For more information, see Manage capacity by using over-provisioning.

Behavior when all zones are healthy

The following list describes what to expect when App Service plans are configured for zone redundancy and all availability zones are operational:

Traffic routing between zones: During normal operations, traffic is routed between all available App Service plan instances across all availability zones.
Data replication between zones: During normal operations, any state stored in your application's file system is stored in zone-redundant storage and synchronously replicated between availability zones.

Behavior during a zone failure

An availability zone outage might affect some aspects of App Service, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

The following list describes what to expect when App Service plans are configured for zone redundancy and one or more availability zones are unavailable:

Detection and response: The App Service platform automatically detects failures in an availability zone and initiates a response. No manual intervention is required to initiate a zone failover.

Notification: Azure doesn't automatically notify you when a zone is down. However, you can use Azure Resource Health to monitor for the health of an individual resource, and you can set up Resource Health alerts to notify you of problems. You can also use Azure Service Health to understand the overall health of the service, including any zone failures, and you can set up Service Health alerts to notify you of problems.

Active requests: Any in-progress requests that connect to an App Service plan instance in the faulty availability zone are terminated. Retry those requests.
Traffic rerouting: App Service detects the lost instances from that zone and attempts to find new replacement instances. After App Service finds replacements, it distributes traffic across the new instances as needed.

If autoscale is configured and determines that more instances are needed, it requests instances from App Service. Autoscale behavior operates independently of App Service platform behavior. So your instance count specification doesn't need to be a multiple of two. For more information, see Scale up an app in App Service and Autoscale overview.

Important

Azure doesn't guarantee that requests for more instances succeed in a zone-down scenario. The platform attempts to backfill lost instances on a best-effort basis. If you need guaranteed capacity during an availability zone failure, create and configure your App Service plans to account for zone loss by over-provisioning the capacity.
Nonruntime behaviors: Applications in a zone-redundant App Service plan continue to run and serve traffic even if an availability zone experiences an outage. However, nonruntime behaviors might be affected during an availability zone outage. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

Zone recovery

When the availability zone recovers, App Service automatically creates instances in the recovered availability zone, removes any temporary instances created in the other availability zones, and routes traffic between your instances as usual.

Test for zone failures

The App Service platform manages traffic routing, failover, and failback for zone-redundant App Service plans. This feature is fully managed, so you don't need to initiate or validate availability zone failure processes.

Resilience to region-wide failures

App Service is a single-region service. If the region becomes unavailable, your application is also unavailable.

Custom multi-region solutions for resiliency

To reduce the risk of a single-region failure affecting your application, you can deploy plans across multiple regions. The following steps help strengthen resilience:

Deploy your application to the plans in each region.
Configure load balancing and failover policies.
Replicate your data across regions so that you can recover your last application state.

Consider the following related resources:

Backup and restore

When you use the Basic tier or higher, you can back up your App Service apps to a file by using the App Service backup and restore capabilities.

These capabilities help when it's difficult to redeploy code or when you store state on disk. Most solutions shouldn't rely exclusively on backups. Instead, use the other capabilities in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't.

Important

Starting March 31, 2028, Azure App Service custom backups will no longer support backing up linked databases. See Deprecation of linked database backups for more information.

Instead, use the native backup and restore tools of your linked database. For more information, see Back up and restore your app in App Service.

Resilience to service maintenance

App Service performs regular service upgrades and other maintenance tasks. To maintain your expected capacity during an upgrade, the platform automatically adds extra instances of the App Service plan during the upgrade process.

Enable zone redundancy. When you enable zone redundancy on your App Service plan, you also improve resiliency during platform updates. Update domains consist of collections of VMs that go offline during an update, and they map to availability zones. Deploying multiple instances in your App Service plan and enabling zone redundancy for your plan adds an extra layer of resiliency if an instance or zone becomes unhealthy during an upgrade.

For more information, see Routine planned maintenance for App Service and Routine maintenance for App Service, restarts, and downtime.

Service-level agreement

The service-level agreement (SLA) for Azure services describes the expected availability of each service and the conditions that your solution must meet to achieve that availability expectation. For more information, see SLAs for online services.

When you deploy a zone-redundant App Service plan, the uptime percentage defined in the SLA increases. The same SLA applies regardless of the number of zones available on the underlying scale unit.

Last updated on 2026-04-13