为 SQL Server 设置灾难恢复Set up disaster recovery for SQL Server

本文介绍如何帮助保护应用程序的 SQL Server 后端。This article describes how to help protect the SQL Server back end of an application. 为此,可以结合使用 SQL Server 业务连续性和灾难恢复 (BCDR) 技术与 Azure Site RecoveryYou do so by using a combination of SQL Server business continuity and disaster recovery (BCDR) technologies and Azure Site Recovery.

在开始之前,请确保了解 SQL Server 灾难恢复功能。Before you start, make sure you understand SQL Server disaster recovery capabilities. 这些功能包括:These capabilities include:

  • 故障转移群集Failover clustering
  • Always On 可用性组Always On availability groups
  • 数据库镜像Database mirroring
  • 日志传送Log shipping
  • 活动异地复制Active geo-replication
  • 自动故障转移组Auto-failover groups

将 BCDR 技术与 Site Recovery 相结合Combining BCDR technologies with Site Recovery

应该根据下表中所述的恢复时间目标 (RTO) 和恢复点目标 (RPO) 需求,选择适当的 BCDR 技术来恢复 SQL Server 实例。Your choice of a BCDR technology to recover SQL Server instances should be based on your recovery time objective (RTO) and recovery point objective (RPO) needs as described in the following table. 将 Site Recovery 与所选技术的故障转移操作相结合,以协调整个应用程序的恢复。Combine Site Recovery with the failover operation of your chosen technology to orchestrate recovery of your entire application.

部署类型Deployment type BCDR 技术BCDR technology SQL Server 的预期 RTOExpected RTO for SQL Server SQL Server 的预期 RPOExpected RPO for SQL Server
Azure 基础结构即服务 (IaaS) 虚拟机 (VM) 上的或本地的 SQL Server。SQL Server on an Azure infrastructure as a service (IaaS) virtual machine (VM) or at on-premises. Always On 可用性组Always On availability group 将次要副本设为主要副本所花费的时间。The time taken to make the secondary replica as primary. 由于复制到次要副本是一种异步操作,因此会有一定的数据丢失。Because replication to the secondary replica is asynchronous, there's some data loss.
Azure IaaS VM 上的或本地的 SQL Server。SQL Server on an Azure IaaS VM or at on-premises. 故障转移群集 (Always On FCI)Failover clustering (Always On FCI) 在节点之间进行故障转移所花费的时间。The time taken to fail over between the nodes. 由于 Always On FCI 使用共享存储,因此故障转移时会提供相同的存储实例视图。Because Always On FCI uses shared storage, the same view of the storage instance is available on failover.
Azure IaaS VM 上的或本地的 SQL Server。SQL Server on an Azure IaaS VM or at on-premises. 数据库镜像(高性能模式)Database mirroring (high-performance mode) 强制服务所花费的时间,使用镜像服务器作为温备用服务器。The time taken to force the service, which uses the mirror server as a warm standby server. 复制是异步的。Replication is asynchronous. 镜像数据库可能稍微滞后于主体数据库。The mirror database might lag somewhat behind the principal database. 滞后时间通常很小。The lag is typically small. 但是,如果主体或镜像服务器的系统负载过大,则滞后时间可能很大。But it can become large if the principal or mirror server's system is under a heavy load.

日志传送可用作数据库镜像的补充。Log shipping can be a supplement to database mirroring. 它是异步数据库镜像的理想替代方案。It's a favorable alternative to asynchronous database mirroring.
Azure 上的 SQL 平台即服务 (PaaS)。SQL as platform as a service (PaaS) on Azure.

此部署类型包括单一数据库和弹性池。This deployment type includes single databases and elastic pools.
活动异地复制Active geo-replication 触发故障转移后持续 30 秒。30 seconds after failover is triggered.

对一个辅助数据库激活故障转移后,所有其他辅助数据库将自动链接到新的主数据库。When failover is activated for one of the secondary databases, all other secondaries are automatically linked to the new primary.
5 秒 RPO。RPO of five seconds.

活动异地复制使用 SQL Server 的 Always On 技术。Active geo-replication uses the Always On technology of SQL Server. 它使用快照隔离以异步方式将主数据库上已提交的事务复制到辅助数据库。It asynchronously replicates committed transactions on the primary database to a secondary database by using snapshot isolation.

保证辅助数据永不包含部分事务。The secondary data is guaranteed to never have partial transactions.
Azure 上配置了活动异地复制的 SQL as PaaS。SQL as PaaS configured with active geo-replication on Azure.

此部署类型包括托管实例、弹性池和单一数据库。This deployment type includes a managed instances, elastic pools, and single databases.
自动故障转移组Auto-failover groups 1 小时 RTO。RTO of one hour. 5 秒 RPO。RPO of five seconds.

自动故障转移组在活动异地复制的顶层提供组语义。Auto-failover groups provide the group semantics on top of active geo-replication. 但使用相同的异步复制机制。But the same asynchronous replication mechanism is used.
Azure IaaS VM 上的或本地的 SQL Server。SQL Server on an Azure IaaS VM or at on-premises. 使用 Azure Site Recovery 进行复制Replication with Azure Site Recovery RTO 通常小于 15 分钟。RTO is typically less than 15 minutes. 有关详细信息,请阅读 Site Recovery 提供的 RTO SLATo learn more, read the RTO SLA provided by Site Recovery. 为应用程序一致性提供 1 小时保证,为崩溃一致性提供 5 分钟保证。One hour for application consistency and five minutes for crash consistency. 若要寻求降低 RPO,请使用其他 BCDR 技术。If you are looking for lower RPO, use other BCDR technologies.


使用 Site Recovery 帮助保护 SQL 工作负荷时需要考虑到几个重要因素:A few important considerations when you're helping to protect SQL workloads with Site Recovery:

  • Site Recovery 是应用程序不可知的。Site Recovery is application agnostic. Azure Site Recovery 可帮助保护部署在受支持操作系统上的任何 SQL Server 版本。Site Recovery can help protect any version of SQL Server that is deployed on a supported operating system. 有关详细信息,请参阅复制的计算机的恢复支持矩阵To learn more, see the support matrix for recovery of replicated machines.
  • 对于 Azure、Hyper-V、VMware 或物理基础结构中的任何部署,都可以选择使用 Site Recovery。You can choose to use Site Recovery for any deployment at Azure, Hyper-V, VMware, or physical infrastructure. 请遵照本文档末尾的指导来了解如何使用 Site Recovery 帮助保护 SQL Server 群集Please follow the guidance at the end of this article on how to help protect a SQL Server cluster with Site Recovery.
  • 确保在计算机上观测到的数据更改率在 Site Recovery 限制范围内。Ensure that the data change rate observed on the machine is within Site Recovery limits. 更改率以每秒写入字节数度量。The change rate is measured in write bytes per second. 对于运行 Windows 的计算机,可以选择任务管理器中的“性能”选项卡来查看此更改率。 For machines running Windows, you can view this change rate by selecting the Performance tab in Task Manager. 观测每个磁盘的写入速度。Observe the write speed for each disk.
  • Site Recovery 支持复制存储空间直通上的故障转移群集实例。Site Recovery supports replication of Failover Cluster Instances on Storage Spaces Direct. 有关详细信息,请参阅如何启用存储空间直通复制To learn more, see how to enable Storage Spaces Direct replication.

将 SQL 工作负载迁移到 Azure 时,建议应用 Azure 虚拟机上的 SQL Server 的性能准则When you migrate your SQL Workload to Azure, it is recommended to apply the Performance guidelines for SQL Server on Azure Virtual Machines.

应用程序的灾难恢复Disaster recovery of an application

Site Recovery 借助恢复计划来协调整个应用程序的测试故障转移和故障转移。Site Recovery orchestrates the test failover and the failover of your entire application with the help of recovery plans.

需要满足一些先决条件才能确保根据需要完全自定义恢复计划。There are some prerequisites to ensure your recovery plan is fully customized according to your need. 任何 SQL Server 部署通常都需要一个 Active Directory 部署。Any SQL Server deployment typically needs an Active Directory deployment. 它还需要应用层的连接。It also needs connectivity for your application tier.

步骤 1:设置 Active DirectoryStep 1: Set up Active Directory

在辅助恢复站点上安装 Active Directory,使 SQL Server 能够正常运行。Set up Active Directory in the secondary recovery site for SQL Server to run properly.

  • 小型企业:你有少量的应用程序和适用于本地站点的单个域控制器。Small enterprise: You have a small number of applications and a single domain controller for the on-premises site. 若要故障转移整个站点,请使用 Site Recovery 复制。If you want to fail over the entire site, use Site Recovery replication. 此服务会将域控制器复制到辅助数据中心或 Azure。This service replicates the domain controller to the secondary datacenter or to Azure.
  • 大中型企业:你可能需要设置其他域控制器。Medium to large enterprise: You might need to set up additional domain controllers.
    • 如果你有大量的应用程序、使用 Active Directory 林,并且想要按应用程序或工作负荷进行故障转移,请在辅助数据中心或 Azure 中设置另一个域控制器。If you have a large number of applications, have an Active Directory forest, and want to fail over by application or workload, set up another domain controller in the secondary datacenter or in Azure.
    • 如果你使用 Always On 可用性组恢复到远程站点,请在辅助站点或 Azure 中设置另一个域控制器。If you're using Always On availability groups to recover to a remote site, set up another domain controller on the secondary site or in Azure. 此域控制器供已恢复的 SQL Server 实例使用。This domain controller is used for the recovered SQL Server instance.

本文中的说明假设辅助位置中提供了域控制器。The instructions in this article assume that a domain controller is available in the secondary location. 有关详细信息,请参阅使用 Site Recovery 帮助保护 Active Directory 的过程。To learn more, see the procedures for helping to protect Active Directory with Site Recovery.

步骤 2:确保与其他层建立连接Step 2: Ensure connectivity with other tiers

在目标 Azure 区域中运行数据库层后,确保与应用层和 Web 层建立连接。After the database tier is running in the target Azure region, ensure that you have connectivity with the application and web tiers. 提前采取必要的步骤来验证与测试故障转移建立的连接。Take the necessary steps in advance to validate connectivity with test failover.

通过以下示例了解如何根据连接注意事项设计应用程序:To understand how you can design applications for connectivity considerations, see these examples:

步骤 3:与 Always On、活动异地复制和自动故障转移组互操作Step 3: Interoperate with Always On, active geo-replication, and auto-failover groups

BCDR 技术 Always On、活动异地复制和自动故障转移组为目标 Azure 区域中运行的 SQL Server 提供辅助副本。BCDR technologies Always On, active geo-replication, and auto-failover groups have secondary replicas of SQL Server running in the target Azure region. 应用程序故障转移的第一步是将此副本指定为主副本。The first step for your application failover is to specify this replica as primary. 此步骤假设次要区域中已有一个域控制器。This step assumes you already have a domain controller in the secondary. 如果你选择执行自动故障转移,则可能不需要执行该步骤。The step may not be necessary if you choose to do an auto-failover. 只有在完成数据库故障转移后,才故障转移 Web 层和应用层。Fail over your web and application tiers only after the database failover is completed.


如果已使用 Site Recovery 来帮助保护 SQL 计算机,则只需创建这些计算机的恢复组,并在恢复计划中添加其故障转移。If you have helped to protect the SQL machines with Site Recovery, you just need to create a recovery group of these machines and add their failover in the recovery plan.

使用应用层和 Web 层虚拟机创建恢复计划Create a recovery plan with application and web tier virtual machines. 以下步骤说明如何添加数据库层的故障转移:The following steps show how to add failover of the database tier:

  1. 导入相应的脚本,用于在资源管理器虚拟机经典虚拟机中对 SQL 可用性组进行故障转移。Import the scripts to fail over SQL Availability Group in both a Resource Manager virtual machine and a classic virtual machine. 将脚本导入到 Azure 自动化帐户中。Import the scripts into your Azure Automation account.


    必须修改从 GitHub 存储库“azure-quickstart-templates”下载或参考的模板,以适应 Azure 中国云环境。Templates you downloaded or referenced from the GitHub Repo "azure-quickstart-templates" must be modified in order to fit in the Azure China Cloud Environment. 例如,替换某些终结点(将“blob.core.windows.net”替换为“blob.core.chinacloudapi.cn”,将“cloudapp.azure.com”替换为“cloudapp.chinacloudapi.cn”);必要时更改某些不受支持的位置、VM 映像、VM 大小、SKU 以及资源提供程序的 API 版本。For example, replace some endpoints -- "blob.core.windows.net" by "blob.core.chinacloudapi.cn", "cloudapp.azure.com" by "cloudapp.chinacloudapi.cn"; change some unsupported Location, VM images, VM sizes, SKU and resource-provider's API Version when necessary.

    选择以下 Deploy to Azure 后,选择 Edit template 并根据 Azure 中国环境更新特定项。After select the below Deploy to Azure, select Edit template and update the specific items to meet Azure China Environment.

    • 在第 14 行将 automationRegion 参数的 allowedValues 属性替换为以下项。Replace the allowedValues propertis of automationRegion parameter with following items on the Line 14. chinaeast2,chinanorth,chinanorth2
    • 选择“保存”。Select save.

    “部署到 Azure”徽标图像Image of a "Deploy to Azure" logo

  2. 将 ASR-SQL-FailoverAG 脚本添加为恢复计划的第一个组的准备操作。Add the ASR-SQL-FailoverAG script as a pre-action of the first group of the recovery plan.

  3. 遵照脚本中的说明创建自动化变量。Follow the instructions available in the script to create an automation variable. 此变量提供可用性组的名称。This variable provides the name of the availability groups.

步骤 4:执行测试故障转移Step 4: Conduct a test failover

某些 BCDR 技术(例如 SQL Always On)原生并不支持测试故障转移。Some BCDR technologies such as SQL Always On don't natively support test failover. 我们建议仅在使用此类技术时才运用以下方法。 We recommend the following approach only when using such technologies.

  1. 在 Azure 中托管可用性组副本的 VM 上设置 Azure 备份Set up Azure Backup on the VM that hosts the availability group replica in Azure.

  2. 触发对恢复计划进行测试故障转移之前,请从上一步骤中进行的备份恢复 VM。Before triggering test failover of the recovery plan, recover the VM from the backup taken in the previous step.

    显示用于从 Azure 备份还原配置的窗口屏幕截图

  3. 在从备份还原的 VM 中强制仲裁Force a quorum in the VM that was restored from backup.

  4. 将侦听器的 IP 地址更新为测试故障转移网络中的可用地址。Update the IP address of the listener to be an address available in the test failover network.

    规则窗口和 IP 地址属性对话框的屏幕截图

  5. 使侦听器联机。Bring the listener online.

    标有 Content_AG 的窗口屏幕截图,其中显示了服务器名称和状态

  6. 确保故障转移网络中的负载均衡器有一个 IP 地址,它来自与每个可用性组侦听器对应的前端 IP 地址池,以及后端池中的 SQL Server VM。Ensure that the load balancer in the failover network has one IP address, from the front-end IP address pool that corresponding to each availability group listener, and with the SQL Server VM in the back-end pool.

    标题为“SQL-AlwaysOn-LB - 前端 IP 池”的窗口屏幕截图

    标题为“SQL-AlwaysOn-LB - 后端 IP 池”的窗口屏幕截图

  7. 在后续恢复组中,依次为此恢复计划中的应用层和 Web 层添加故障转移。In later recovery groups, add failover of your application tier followed by your web tier for this recovery plan.

  8. 执行恢复计划的测试性故障转移,以测试应用程序的端到端故障转移。Do a test failover of the recovery plan to test end-to-end failover of your application.

执行故障转移的步骤Steps to do a failover

在步骤 3 中添加脚本并在步骤 4 中验证该脚本后,可以执行步骤 3 中创建的恢复计划的故障转移。After you add the script in Step 3 and validate it in Step 4, you can do a failover of the recovery plan created in Step 3.

在测试故障转移和故障转移恢复计划中,应用层和 Web 层的故障转移步骤应该相同。The failover steps for application and web tiers should be the same in both test failover and failover recovery plans.

如何帮助保护 SQL Server 群集How to help protect a SQL Server cluster

对于运行 SQL Server Standard Edition 或 SQL Server 2008 R2 的群集,建议使用 Site Recovery 复制来帮助保护 SQL Server。For a cluster running SQL Server Standard edition or SQL Server 2008 R2, we recommend you use Site Recovery replication to help protect SQL Server.

Azure 到 Azure,以及本地到 AzureAzure to Azure and On-premises to Azure

复制到 Azure 区域时,Site Recovery 不支持来宾群集。Site Recovery doesn't provide guest cluster support when replicating to an Azure region. SQL Server Standard Edition 也不提供低成本灾难恢复解决方案。SQL Server Standard edition also doesn't provide a low-cost disaster recovery solution. 在此方案中,我们建议在主要位置的独立 SQL Server 实例中保护 SQL Server 群集,并在次要位置恢复它。In this scenario, we recommend you protect the SQL Server cluster to a standalone SQL Server instance in the primary location and recover it in the secondary.

  1. 在主要 Azure 区域或本地站点中配置其他独立 SQL Server 实例。Configure an additional standalone SQL Server instance on the primary Azure region or at on-premises site.

  2. 将此实例配置为需要帮助保护的数据库的镜像。Configure the instance to serve as a mirror for the databases you want to help protect. 在高安全模式下配置镜像。Configure mirroring in high-safety mode.

  3. 在主要站点上为 AzureHyper-VVMware VM 和物理服务器配置 Site Recovery。Configure Site Recovery on the primary site for Azure, Hyper-V, or VMware VMs and physical servers.

  4. 使用 Site Recovery 复制将新的 SQL Server 实例复制到次要站点。Use Site Recovery replication to replicate the new SQL Server instance to the secondary site. 由于该实例是高安全性镜像副本,因此会将它与主群集同步,但会使用 Site Recovery 复制来复制它。As it's a high-safety mirror copy, it will be synchronized with the primary cluster but replicated using Site Recovery replication.

    标准群集插图,其中显示了主要站点、Site Recovery 和 Azure 之间的关系与流

故障回复注意事项Failback considerations

对于 SQL Server Standard 群集,完成计划外故障转移后进行故障回复需要执行 SQL Server 备份和还原。For SQL Server Standard clusters, failback after an unplanned failover requires a SQL Server backup and restore. 从镜像实例对原始群集执行此操作,并重建镜像。This operation is done from the mirror instance to the original cluster with re-establishment of the mirror.

常见问题Frequently asked questions

与 Site Recovery 配合使用时如何获得 SQL Server 的授权?How does SQL Server get licensed when used with Site Recovery?

SQL Server 的 Site Recovery 复制已涵盖在“软件保障 - 灾难恢复”权益中。Site Recovery replication for SQL Server is covered under the Software Assurance disaster recovery benefit. 这项权益适用于所有 Site Recovery 方案:本地到 Azure 的灾难恢复,以及跨区域的 Azure IaaS 灾难恢复。This coverage applies to all Site Recovery scenarios: on-premises to Azure disaster recovery and cross-region Azure IaaS disaster recovery. 有关详细信息,请参阅 Azure Site Recovery 定价See Azure Site Recovery pricing for more.

Site Recovery 是否支持我的 SQL Server 版本?Will Site Recovery support my SQL Server version?

Site Recovery 是应用程序不可知的。Site Recovery is application agnostic. Azure Site Recovery 可帮助保护部署在受支持操作系统上的任何 SQL Server 版本。Site Recovery can help protect any version of SQL Server that is deployed on a supported operating system. 有关详细信息,请参阅复制的计算机的恢复支持矩阵For more, see the support matrix for recovery of replicated machines.

后续步骤Next steps