Best practices for disaster recovery with Azure File Sync

For Azure File Sync, there are two main areas to consider for disaster recovery: high availability, and data redundancy. This article covers each area and helps you decide what configuration to use for your own disaster recovery solution.

In an Azure File Sync deployment, the cloud endpoint always contains a full copy of your data, while the on-premises server can be viewed as a disposable cache of your data. In the event of a server-side disaster, you can recover by provisioning a new server, installing the Azure File Sync agent on it, and setting it up as a new server endpoint.

Due to its hybrid nature, some traditional server backup and disaster recovery strategies won't work with Azure File Sync. For any registered server, Azure File Sync doesn't support:

Warning

Taking any of these actions may lead to issues with sync or broken tiered files that result in eventual data loss. If you've taken one of these actions, contact Azure support to ensure your deployment is healthy.

  • Transferring/cloning disk drives (volume) from one server to another while the server endpoint is still active
  • Restoring from an operating system backup
  • Cloning a server's operating system to another server
  • Reverting to a previous virtual machine checkpoint
  • Restoring tiered files from on-premises (third party) backup to the server endpoint

High availability

There are two different strategies you can use to achieve high availability for your on-premises server. You can either configure a failover cluster, or configure a standby server. The deciding factor between either configuration is how much you're willing to invest in your system, and if minimizing the length of time your system is down in the case of a disaster is worth that extra cost.

For a failover cluster, you don't need to take any special steps to use Azure File Sync. For a standby server, you should make the following configurations:

Have a secondary server with different server endpoints that sync to the same sync group as your primary server but don't enable end-user access to the server. This allows all files to sync from the primary server to the standby server. You can consider enabling namespace-only tiering so that only the namespace is downloaded initially. If your primary server fails, you can use DFS-N to quickly reconfigure end-user access to your standby server.

Data redundancy

To ensure a robust disaster recovery solution, add some form of data redundancy to your infrastructure. There are four redundancy offerings for Azure Files: Locally-redundant storage (LRS), zone-redundant storage (ZRS), geo-redundant storage (GRS), and geo-zone-redundant storage (GZRS).

  • Locally-redundant storage (LRS): With LRS, every file is stored three times within an Azure storage cluster. This protects against loss of data due to hardware faults, such as a bad disk drive. However, if a disaster such as fire or flooding occurs within the data center, all replicas of a storage account using LRS may be lost or unrecoverable.
  • Zone-redundant storage (ZRS): With ZRS, three copies of each file stored, however these copies are physically isolated in three distinct storage clusters in different Azure availability zones. Availability zones are unique physical locations within an Azure region. Each zone is made up of one or more data centers equipped with independent power, cooling, and networking. A write to storage is not accepted until it is written to the storage clusters in all three availability zones.
  • Geo-redundant storage (GRS): With GRS, you have two regions, a primary and secondary region. Files are stored three times within an Azure storage cluster in the primary region. Writes are asynchronously replicated to a Azure-defined secondary region. GRS provides six copies of your data spread between two Azure regions.
  • Geo-zone-redundant storage (GZRS): You can think of GZRS as if it were like ZRS but with geo-redundancy. With GZRS, files are stored three times across three distinct storage clusters in the primary region. All writes are then asynchronously replicated to a Microsoft-defined secondary region.

For a robust disaster recovery solution, most customers should consider ZRS. ZRS adds the least amount of extra cost for its added data redundancy benefits and is also the most seamless in the event of an outage. If your organization's policy or regulatory requirements require geo-redundancy for your data, consider either GRS or GZRS.

Geo-redundancy

If your storage account is configured with either GRS or GZRS replication, Azure will initiate the failover of the Storage Sync Service if the primary region is judged to be permanently unrecoverable or unavailable for a long time. No action is required from you in the event of a disaster.

Although you can manually request a failover of your Storage Sync Service to your GRS or GZRS paired region, we don't recommend doing this outside of large-scale regional outages because the process isn't seamless and might incur extra cost. To initiate the process, open a support ticket and request that both your Azure storage accounts that contain your Azure file share and your Storage Sync Service be failed over.

Warning

You must contact support to request your Storage Sync Service be failed over if you are initiating this process manually. Attempting to create a new Storage Sync Service using the same server endpoints in the secondary region might result in extra data staying in your storage account because the previous installation of Azure File Sync won't be cleaned up.

Once a failover occurs, server endpoints will switch over to sync with the cloud endpoint in the secondary region automatically. However, the server endpoints must reconcile with the cloud endpoints. This might result in file conflicts, as the data in the secondary region might not be caught up to the primary.