Azure 虚拟机中 SQL Server 的高可用性和灾难恢复High availability and disaster recovery for SQL Server in Azure Virtual Machines

带有 SQL Server 的 Azure 虚拟机 (VM) 有助于降低高可用性和灾难恢复 (HADR) 数据库解决方案的成本。Azure virtual machines (VMs) with SQL Server can help lower the cost of a high availability and disaster recovery (HADR) database solution. Azure 虚拟机支持大多数 SQL Server HADR 解决方案,这些解决方案既可以仅包含 Azure,也可是混合解决方案。Most SQL Server HADR solutions are supported in Azure virtual machines, both as Azure-only and as hybrid solutions. 在仅包含 Azure 的解决方案中,整个 HADR 系统都在 Azure 中运行。In an Azure-only solution, the entire HADR system runs in Azure. 而在混合配置中,解决方案的一部分在 Azure 中运行,另一部分则在组织的本地运行。In a hybrid configuration, part of the solution runs in Azure and the other part runs on-premises in your organization. Azure 环境具有灵活性,允许部分或完全迁移至 Azure,以满足 SQL Server 数据库系统对于预算和 HADR 的要求。The flexibility of the Azure environment enables you to move partially or completely to Azure to satisfy the budget and HADR requirements of your SQL Server database systems.

Note

Azure 具有用于创建和处理资源的两个不同的部署模型:资源管理器部署模型和经典部署模型Azure has two different deployment models for creating and working with resources: Resource Manager and classic. 这篇文章介绍了如何使用这两种模型,但 Azure 建议大多数最新部署使用 Resource Manager 模型。This article covers using both models, but Azure recommends that most new deployments use the Resource Manager model.

了解对 HADR 解决方案的需求Understanding the need for an HADR solution

有责任确保数据库系统拥有服务级别协议 (SLA) 要求的 HADR 功能。It is up to you to ensure that your database system possesses the HADR capabilities that the service-level agreement (SLA) requires. Azure 提供了高可用性机制,例如云服务的服务修复和虚拟机的故障恢复检测,但这一事实自身并不保证你能够达到所需 SLA 的要求。The fact that Azure provides high availability mechanisms, such as service healing for cloud services and failure recovery detection for Virtual Machines, does not itself guarantee you can meet the desired SLA. 这些机制可以保护 VM 的高可用性,但不能保护在 VM 内部运行的 SQL Server 的高可用性。These mechanisms protect the high availability of the VMs but not the high availability of SQL Server running inside the VMs. VM 联机并正常运行时,SQL Server 实例也可能会出故障。It is possible for the SQL Server instance to fail while the VM is online and healthy. 再者,即便是 Azure 提供的高可用性机制,也会在 VM 遇到从软件或硬件故障进行恢复、操作系统升级等事件时,为其留出一定的停机时间。Moreover, even the high availability mechanisms provided by Azure allow for downtime of the VMs due to events such as recovery from software or hardware failures and operating system upgrades.

此外,使用异地复制功能在 Azure 中实现的异地冗余存储 (GRS),可能不适合作为数据库的灾难恢复解决方案。In addition, Geo Redundant Storage (GRS) in Azure, which is implemented with a feature called geo-replication, may not be an adequate disaster recovery solution for your databases. 因为异地复制功能会异步发送数据,在发生灾难的情况下,最近的更新可能丢失。Because geo-replication sends data asynchronously, recent updates can be lost in the event of disaster. 数据和日志文件各自在不同磁盘上的情况下不支持异地复制 部分中提供了有关异地复制限制的详细信息。More information regarding geo-replication limitations are covered in the Geo-replication not supported for data and log files on separate disks section.

HADR 部署体系结构HADR deployment architectures

Azure 支持的 SQL Server HADR 技术包括:SQL Server HADR technologies that are supported in Azure include:

可将多种技术配合使用,以实现具有高可用性和灾难恢复功能的 SQL Server 解决方案。It is possible to combine the technologies together to implement a SQL Server solution that has both high availability and disaster recovery capabilities. 根据所用技术的不同,混合部署可能需要使用 VPN 隧道连接 Azure 虚拟网络。Depending on the technology you use, a hybrid deployment may require a VPN tunnel with the Azure virtual network. 以下部分显示了某些部署体系结构的示例。The sections below show you some of the example deployment architectures.

仅限 Azure:高可用性解决方案Azure-only: High availability solutions

可针对在数据库级别具有 Always On 可用性组(称为可用性组)的 SQL Server 提供高可用性解决方案。You can have a high availability solution for SQL Server at a database level with Always On Availability Groups - called availability groups. 还可以在具有 Always On 故障转移群集实例的实例级别创建高可用性解决方案。You can also create a high availability solution at an instance level with Always On Failover Cluster Instances - failover cluster instances. 对于其他冗余,可以通过在故障转移群集实例上创建可用性组,在两个级别上创建冗余。For additional redundancy, you can create redundancy at both levels by creating availability groups on failover cluster instances.

技术Technology 示例体系结构Example Architectures
可用性组Availability groups 在同一区域的 Azure VM 中运行的可用性副本提供高可用性。Availability replicas running in Azure VMs in the same region provide high availability. 需要配置域控制器 VM,因为 Windows 故障转移群集需要 Active Directory 域。You need to configure a domain controller VM, because Windows failover clustering requires an Active Directory domain.
可用性组
故障转移群集实例Failover cluster instances 可通过 3 种不同的方式创建需要共享存储的故障转移群集实例 (FCI)。Failover Cluster Instances (FCI), which require shared storage, can be created in 3 different ways.

1.通过 Windows Server 2016 存储空间直通 (S2D) 在具有附加存储的 Azure VM 中运行的双节点故障转移群集,用于提供基于软件的虚拟 SAN。1. A two-node failover cluster running in Azure VMs with attached storage using Windows Server 2016 Storage Spaces Direct (S2D) to provide a software-based virtual SAN.

2.在其存储由第三方群集解决方案支持的 Azure VM 中运行的双节点故障转移群集。2. A two-node failover cluster running in Azure VMs with storage supported by a third-party clustering solution. 有关使用 SIOS DataKeeper 的具体示例,请参阅使用故障转移群集和第三方软件 SIOS Datakeeper 的文件共享的高可用性For a specific example that uses SIOS DataKeeper, see High availability for a file share using failover clustering and 3rd party software SIOS DataKeeper.

3.通过 ExpressRoute 在具有远程 iSCSI 目标共享块存储中运行的双节点故障转移群集。3. A two-node failover cluster running in Azure VMs with remote iSCSI Target shared block storage via ExpressRoute. 例如,NetApp 专用存储 (NPS) 使用 Equinix 通过 ExpressRoute 向 Azuer VM 公开 iSCSI 目标。For example, NetApp Private Storage (NPS) exposes an iSCSI target via ExpressRoute with Equinix to Azure VMs.

对于第三方共享存储和数据复制解决方案,如有任何关于在故障转移时访问数据的问题,请联系供应商。For third-party shared storage and data replication solutions, you should contact the vendor for any issues related to accessing data on failover.

请注意,目前尚不支持在 Azure 文件存储外部使用 FCI,因为此解决方案不使用高级存储。Note that using FCI on top of Azure File storage is not supported yet, because this solution does not utilize Premium Storage. 我们正在努力很快支持此功能。We are working to support this soon.

仅限 Azure:灾难恢复解决方案Azure-only: Disaster recovery solutions

可将可用性组、数据库镜像或备份和还原与存储 Blob 配合使用,为 Azure 中的 SQL Server 数据库提供灾难恢复解决方案。You can have a disaster recovery solution for your SQL Server databases in Azure using availability groups, database mirroring, or backup and restore with storage blobs.

技术Technology 示例体系结构Example Architectures
可用性组Availability Groups 可用性副本在 Azure VM 中跨多个数据中心运行以实现灾难恢复。Availability replicas running across multiple datacenters in Azure VMs for disaster recovery. 这种跨区域解决方案可以防止站点完全中断。This cross-region solution protects against complete site outage.
可用性组Availability Groups
在一个区域内,所有副本应该位于同一云服务和同一 VNet 中。Within a region, all replicas should be within the same cloud service and the same VNet. 由于每个区域会有单独的 VNet,因此这些解决方案需要 VNet 到 VNet 连接。Because each region will have a separate VNet, these solutions require VNet to VNet connectivity. 有关详细信息,请参阅使用 Azure 门户配置 VNet 到 VNet 连接For more information, see Configure a VNet-to-VNet connection using the Azure portal. 有关详细说明,请参阅在不同区域的 Azure 虚拟机上配置 SQL Server 可用性组For detailed instructions, see Configure a SQL Server Availability Group on Azure Virtual Machines in Different Regions.
数据库镜像Database Mirroring 主体和镜像以及服务器在不同数据库中运行以实现灾难恢复。Principal and mirror and servers running in different datacenters for disaster recovery. 必须使用服务器证书进行部署。You must deploy using server certificates.
数据库镜像
使用 Azure Blob 存储服务进行备份和还原Backup and Restore with Azure Blob Storage Service 生产数据库直接备份到不同数据中心内的 Blob 存储以实现灾难恢复。Production databases backed up directly to blob storage in a different datacenter for disaster recovery.
备份和还原Backup and Restore
有关详细信息,请参阅 Azure 虚拟机中 SQL Server 的备份和还原For more information, see Backup and Restore for SQL Server in Azure Virtual Machines.
使用 Azure Site Recovery 将 SQL Server 复制和故障转移到 Azure Replicate and Failover SQL Server to Azure with Azure Site Recovery 一个 Azure 数据中心的生产 SQL Server 直接复制到其他 Azure 数据中心的 Azure 存储以实现灾难恢复。Production SQL Server of one Azure datacenter replicated directly to Azure Storage of different Azure datacenter for disaster recovery.
使用 Azure Site Recovery 进行复制Replicate using Azure Site Recovery
有关详细信息,请参阅使用 SQL Server 灾难恢复和 Azure Site Recovery 来保护 SQL ServerFor more information, see Protect SQL Server using SQL Server disaster recovery and Azure Site Recovery.

混合 IT:灾难恢复解决方案Hybrid IT: Disaster recovery solutions

可将可用性组、数据库镜像、日志传送以及备份和还原与 Azure Blob 存储配合使用,在混合 IT 环境中为 SQL Server 数据库提供灾难恢复解决方案。You can have a disaster recovery solution for your SQL Server databases in a hybrid-IT environment using availability groups, database mirroring, log shipping, and backup and restore with Azure blog storage.

技术Technology 示例体系结构Example Architectures
可用性组Availability Groups 某些可用性副本运行在 Azure VM 中,另一些则在本地运行,以实现跨站点灾难恢复。Some availability replicas running in Azure VMs and other replicas running on-premises for cross-site disaster recovery. 生产站点可位于本地,也可以位于 Azure 数据中心。The production site can be either on-premises or in an Azure datacenter.
可用性组
由于所有可用性副本必须在同一故障转移群集中,因此该群集必须同时跨越这两个网络(多子网故障转移群集)。Because all availability replicas must be in the same failover cluster, the cluster must span both networks (a multi-subnet failover cluster). 此配置需要在 Azure 与本地网络之间使用 VPN 连接。This configuration requires a VPN connection between Azure and the on-premises network.

为了成功地对数据库进行灾难恢复,还应在灾难恢复站点上安装副本域控制器。For successful disaster recovery of your databases, you should also install a replica domain controller at the disaster recovery site.

可以使用 SSMS 中的“添加副本向导”将 Azure 副本添加到现有的 Always On 可用性组。It is possible to use the Add Replica Wizard in SSMS to add an Azure replica to an existing Always On Availability Group. 有关详细信息,请参阅“教程:将 AlwaysOn 可用性组扩展到 Azure”。For more information, see Tutorial: Extend your Always On Availability Group to Azure.
数据库镜像Database Mirroring 一个伙伴在 Azure VM 中运行,另一个则在本地运行,以实现使用服务器证书进行跨站点灾难恢复。One partner running in an Azure VM and the other running on-premises for cross-site disaster recovery using server certificates. 合作伙伴不必在同一 Active Directory 域中,并且不需要 VPN 连接。Partners do not need to be in the same Active Directory domain, and no VPN connection is required.
数据库镜像Database Mirroring
另一种数据库镜像方案是一个合作伙伴在 Azure VM 中运行,另一个则在同一 Active Directory 域中本地运行,以实现跨站点灾难恢复。Another database mirroring scenario involves one partner running in an Azure VM and the other running on-premises in the same Active Directory domain for cross-site disaster recovery. 需要在 Azure 虚拟网络与本地网络之间使用 VPN 连接A VPN connection between the Azure virtual network and the on-premises network is required.

为了成功地对数据库进行灾难恢复,还应在灾难恢复站点上安装副本域控制器。For successful disaster recovery of your databases, you should also install a replica domain controller at the disaster recovery site.
日志传送Log Shipping 一个服务器在 Azure VM 中运行,另一个则在本地运行,以实现跨站点灾难恢复。One server running in an Azure VM and the other running on-premises for cross-site disaster recovery. 日志传送依赖于 Windows 文件共享,因此需要在 Azure 虚拟网络与本地网络之间使用 VPN 连接。Log shipping depends on Windows file sharing, so a VPN connection between the Azure virtual network and the on-premises network is required.
日志传送
为了成功地对数据库进行灾难恢复,还应在灾难恢复站点上安装副本域控制器。For successful disaster recovery of your databases, you should also install a replica domain controller at the disaster recovery site.
使用 Azure Blob 存储服务进行备份和还原Backup and Restore with Azure Blob Storage Service 本地生产数据库直接备份到 Azure Blob 存储以实现灾难恢复。On-premises production databases backed up directly to Azure blob storage for disaster recovery.
备份和还原Backup and Restore
有关详细信息,请参阅 Azure 虚拟机中 SQL Server 的备份和还原For more information, see Backup and Restore for SQL Server in Azure Virtual Machines.
使用 Azure Site Recovery 将 SQL Server 复制和故障转移到 Azure Replicate and Failover SQL Server to Azure with Azure Site Recovery 本地生产 SQL Server 直接复制到 Azure 存储以实现灾难恢复。On-premises production SQL Server replicated directly to Azure Storage for disaster recovery.
使用 Azure Site Recovery 进行复制Replicate using Azure Site Recovery
有关详细信息,请参阅使用 SQL Server 灾难恢复和 Azure Site Recovery 来保护 SQL ServerFor more information, see Protect SQL Server using SQL Server disaster recovery and Azure Site Recovery.

有关 Azure 中的 SQL Server HADR 的重要注意事项Important considerations for SQL Server HADR in Azure

Azure VM、存储和网络的运行特征与本地非虚拟化的 IT 基础结构不同。Azure VMs, storage, and networking have different operational characteristics than an on-premises, non-virtualized IT infrastructure. 需要了解这些区别并设计可适应这些区别的解决方案,才能成功地在 Azure 中实现 HADR SQL Server 解决方案。A successful implementation of a HADR SQL Server solution in Azure requires that you understand these differences and design your solution to accommodate them.

可用性集中的高可用性节点High availability nodes in an availability set

使用 Azure 中的高可用性集,可以将高可用性节点放置在单独的容错域 (FD) 和更新域 (UD) 中。Availability sets in Azure enable you to place the high availability nodes into separate Fault Domains (FDs) and Update Domains (UDs). 要将 Azure VM 放入同一可用性集,必须将这些 VM 部署到同一云服务中。For Azure VMs to be placed in the same availability set, you must deploy them in the same cloud service. 只有同一云服务中的节点可加入同一可用性集。Only nodes in the same cloud service can participate in the same availability set. 有关详细信息,请参阅 管理虚拟机的可用性For more information, see Manage the Availability of Virtual Machines.

故障转移群集在 Azure 网络中的行为Failover cluster behavior in Azure networking

Azure 中的 DHCP 服务不符合 RFC 标准,可能会导致创建某些故障转移群集配置失败,因为向群集网络名称分配了重复的 IP 地址(例如 IP 地址与某个群集节点相同)。The non-RFC-compliant DHCP service in Azure can cause the creation of certain failover cluster configurations to fail, due to the cluster network name being assigned a duplicate IP address, such as the same IP address as one of the cluster nodes. 实现可用性组时,这种情况会产生一个问题,因为它依赖于 Windows 故障转移群集功能。This is an issue when you implement Availability Groups, which depends on the Windows failover cluster feature.

创建两节点群集并使其联机时,请考虑此应用场景:Consider the scenario when a two-node cluster is created and brought online:

  1. 群集联机,NODE1 会为群集网络名称请求一个动态分配的 IP 地址。The cluster comes online, then NODE1 requests a dynamically assigned IP address for the cluster network name.
  2. DHCP 服务除了 NODE1 自身的 IP 地址以外不提供任何 IP 地址,因为 DHCP 服务可以识别请求是否来自 NODE1 自身。No IP address other than NODE1's own IP address is given by the DHCP service, since the DHCP service recognizes that the request comes from NODE1 itself.
  3. Windows 检测到同时向 NODE1 和故障转移群集网络名称分配了一个重复的地址,并且默认群集组未能联机。Windows detects that a duplicate address is assigned both to NODE1 and to the failover cluster network name, and the default cluster group fails to come online.
  4. 默认群集组移至 NODE2,后者将 NODE1 的 IP 地址视为群集 IP 地址,并使默认群集组联机。The default cluster group moves to NODE2, which treats NODE1's IP address as the cluster IP address and brings the default cluster group online.
  5. 当 NODE2 尝试与 NODE1 建立连接时,针对 NODE1 的数据包从不离开 NODE2,因为后者将 NODE1 的 IP 地址解析为其自身。When NODE2 attempts to establish connectivity with NODE1, packets directed at NODE1 never leave NODE2 because it resolves NODE1's IP address to itself. NODE2 无法与 NODE1 建立连接,它会丢失仲裁并关闭群集。NODE2 cannot establish connectivity with NODE1, then loses quorum and shuts down the cluster.
  6. 同时,NODE1 可向 NODE2 发送数据包,但 NODE2 无法回复。In the meantime, NODE1 can send packets to NODE2, but NODE2 cannot reply. NODE1 丢失仲裁并关闭群集。NODE1 loses quorum and shuts down the cluster.

可通过将未使用的静态 IP 地址(如 169.254.1.1 等链接本地 IP 地址)分配给群集网络名称,让群集网络名称联机,从而避免这种情况发生。This scenario can be avoided by assigning an unused static IP address, such as a link-local IP address like 169.254.1.1, to the cluster network name in order to bring the cluster network name online. 若要简化此过程,请参阅在 Azure 中针对可用性组配置 Windows 故障转移群集To simplify this process, see Configuring Windows failover cluster in Azure for availability groups.

可用性组侦听器支持Availability group listener support

运行 Windows Server 2008 R2、Windows Server 2012、Windows Server 2012 R2 和 Windows Server 2016 的 Azure VM 支持可用性组侦听器。Availability group listeners are supported on Azure VMs running Windows Server 2008 R2, Windows Server 2012, Windows Server 2012 R2, and Windows Server 2016. 这种支持的实现,是借助于在 Azure VM 上启用的负载均衡终结点,它们都是可用性组节点。This support is made possible by the use of load-balanced endpoints enabled on the Azure VMs that are availability group nodes. 必须执行特殊的配置步骤,才能让侦听器对在 Azure 中运行和本地运行的客户端应用程序都有效。You must follow special configuration steps for the listeners to work for both client applications that are running in Azure as well as those running on-premises.

设置侦听器时有两个主要选项:“外部(公共)”或“内部”。There are two main options for setting up your listener: external (public) or internal. 外部(公共)侦听器使用面向 Internet 的负载均衡器,并与可通过 Internet 访问的公共虚拟 IP (VIP) 相关联。The external (public) listener uses an internet facing load balancer and is associated with a public Virtual IP (VIP) that is accessible over the internet. 内部侦听器使用内部负载均衡器,且仅支持同一虚拟网络中的客户端。An internal listener uses an internal load balancer and only supports clients within the same Virtual Network. 对于任一负载均衡器类型,都必须启用直接服务器返回。For either load balancer type, you must enable Direct Server Return.

如果可用性组跨多个 Azure 子网(例如,跨 Azure 区域的部署),则客户端连接字符串必须包含“MultisubnetFailover=True”。If the Availability Group spans multiple Azure subnets (such as a deployment that crosses Azure regions), the client connection string must include "MultisubnetFailover=True". 这会导致与不同子网中的副本建立并行连接。This results in parallel connection attempts to the replicas in the different subnets. 有关设置侦听器的说明,请参阅For instructions on setting up a listener, see

仍可通过直接连接到服务实例,单独连接到每个可用性副本。You can still connect to each availability replica separately by connecting directly to the service instance. 此外,由于可用性组与数据库镜像客户端向后兼容,因此可以像数据库镜像伙伴一样连接到可用性副本,只要这些副本配置得类似于数据库镜像即可:Also, since availability groups are backward compatible with database mirroring clients, you can connect to the availability replicas like database mirroring partners as long as the replicas are configured similar to database mirroring:

  • 一个主副本和一个辅助副本One primary replica and one secondary replica
  • 将辅助副本配置为不可读(“可读辅助副本”选项设置为“否”) The secondary replica is configured as non-readable (Readable Secondary option set to No)

下面是一个示例客户端连接字符串,它对应于这个使用 ADO.NET 或 SQL Server 本机客户端的类似于数据库镜像的配置:An example client connection string that corresponds to this database mirroring-like configuration using ADO.NET or SQL Server Native Client is below:

Data Source=ReplicaServer1;Failover Partner=ReplicaServer2;Initial Catalog=AvailabilityDatabase;

有关客户端连接的详细信息,请参阅:For more information on client connectivity, see:

混合 IT 环境中的网络延迟Network latency in hybrid IT

在部署 HADR 解决方案时,应该假设本地网络和 Azure 之间有时可能会存在很高的网络延迟。You should deploy your HADR solution with the assumption that there may be periods of time with high network latency between your on-premises network and Azure. 将副本部署到 Azure 时,同步模式应使用异步提交而非同步提交。When deploying replicas to Azure, you should use asynchronous commit instead of synchronous commit for the synchronization mode. 同时在本地和 Azure 中部署数据库镜像服务器时,请使用高性能模式,而非高安全模式。When deploying database mirroring servers both on-premises and in Azure, use the high-performance mode instead of the high-safety mode.

异地复制支持Geo-replication support

Azure 磁盘中的异地复制不支持将同一数据库的数据文件和日志文件各自存储在不同的磁盘上。Geo-replication in Azure disks does not support the data file and log file of the same database to be stored on separate disks. GRS 独立并异步地复制对每个磁盘的更改。GRS replicates changes on each disk independently and asynchronously. 此机制可保证单个磁盘中异地复制副本上的写顺序,但不能保证多个磁盘的异地复制副本上的写顺序。This mechanism guarantees the write order within a single disk on the geo-replicated copy, but not across geo-replicated copies of multiple disks. 如果将数据库配置为将其数据文件和日志文件各自存储在不同的磁盘上,则灾难后恢复的磁盘所含的数据文件副本可能比日志文件副本新,而这违反了 SQL Server 中的预写日志以及事务的 ACID 属性。If you configure a database to store its data file and its log file on separate disks, the recovered disks after a disaster may contain a more up-to-date copy of the data file than the log file, which breaks the write-ahead log in SQL Server and the ACID properties of transactions. 如果无法对存储帐户禁用异地复制,则应将给定数据库的所有数据和日志文件都保留在同一磁盘上。If you do not have the option to disable geo-replication on the storage account, you should keep all data and log files for a given database on the same disk. 如果因数据库较大而必须使用多个磁盘,则需要部署上面列出的某个灾难恢复解决方案以确保数据冗余。If you must use more than one disk due to the size of the database, you need to deploy one of the disaster recovery solutions listed above to ensure data redundancy.

后续步骤Next steps

如果需要创建使用 SQL Server 的 Azure 虚拟机,请参阅在 Azure 上预配 SQL Server 虚拟机If you need to create an Azure virtual machine with SQL Server, see Provisioning a SQL Server Virtual Machine on Azure.

若要使 Azure VM 上运行的 SQL Server 保持最佳性能,请参阅 Azure 虚拟机中 SQL Server 的性能最佳实践中的指导。To get the best performance from SQL Server running on an Azure VM, see the guidance in Performance Best Practices for SQL Server in Azure Virtual Machines.

有关其他与在 Azure VM 中运行 SQL Server 相关的主题,请参阅 SQL Server on Azure Virtual Machines(Azure 虚拟机上的 SQL Server)。For other topics related to running SQL Server in Azure VMs, see SQL Server on Azure Virtual Machines.

其他资源Other resources