Failover for business continuity and disaster recovery with Azure Cosmos DB for MongoDB vCore

To maximize your uptime, plan ahead to maintain business continuity and prepare for disaster recovery with Azure Cosmos DB for MongoDB vCore.

While Azure services are designed to maximize uptime, unplanned service outages might occur. A disaster recovery plan ensures that you have a strategy in place for handling regional service outages.

In this article, learn how to:

  • Plan a multi-regional deployment of Azure Cosmos DB for MongoDB vCore and associated resources.
  • Design your solutions for high availability.
  • Initiate a failover to another Azure region.

Important

Azure Cosmos DB for MongoDB vCore does not provide built-in automatic failover or disaster recovery. Planning for high availability is a critical step as your solution scales.

Azure Cosmos DB for MongoDB vCore automatically takes backups of your data at regular intervals. The automatic backups are taken without affecting the performance or availability of the database operations. All backups are performed automatically in the background and stored separately from the source data in a storage service. These automatic backups are useful in scenarios when you accidentally delete or modify resources and later require the original versions.

Automatic backups are retained in various intervals based on whether the cluster is currently active or recently deleted.

Retention period
Active clusters 35 days
Deleted clusters 7 days

Design for high availability

High availability (HA) should be enabled for critical Azure Cosmos DB for MongoDB vCore clusters running production workloads. In an HA-enabled cluster, each shard serves as a primary along with a hot-standby shard provisioned in another availability zone. Replication between the primary and the secondary shard is synchronous by default. Any modification to the database is persisted on both the primary and the secondary (hot-standby) shards before a response from the database is received.

The service maintains health checks and heartbeats to each primary and secondary shard of the cluster. If a primary shard becomes unavailable due to a zone or regional outage, the secondary shard is automatically promoted to become the new primary and a subsequent secondary shard is built for the new primary. In addition, if a secondary shard becomes unavailable, the service auto creates a new secondary shard with a full copy of data from the primary.

If the service triggers a failover from the primary to the secondary shard, connections are seamlessly routed under the covers to the new primary shard.

Synchronous replication between the primary and secondary shards guarantees no data loss if there's a failover.

Configure high availability

High availability can be specified when creating a new cluster or updating an existing cluster.