使用自动故障转移组可以实现多个数据库的透明、协调式故障转移Use auto-failover groups to enable transparent and coordinated failover of multiple databases

适用于:是 Azure SQL 数据库 是Azure SQL 托管实例 APPLIES TO: yesAzure SQL Database yesAzure SQL Managed Instance

通过自动故障转移组功能,可以管理服务器中一组数据库或托管实例中所有数据库到另一区域的复制和故障转移。The auto-failover groups feature allows you to manage the replication and failover of a group of databases on a server or all databases in a managed instance to another region. 它是建立在现有活动异地复制功能基础之上的声明性抽象,旨在简化异地复制的数据库的大规模部署和管理。It is a declarative abstraction on top of the existing active geo-replication feature, designed to simplify deployment and management of geo-replicated databases at scale. 可以手动启动故障转移,也可以基于用户定义的策略委托 Azure 服务进行故障转移。You can initiate failover manually or you can delegate it to the Azure service based on a user-defined policy. 使用后一种做法可在发生下述情况后自动恢复次要区域中的多个相关数据库:灾难性故障或其他导致主要区域中 SQL 数据库或 SQL 托管实例完全或部分丧失可用性的计划外事件。The latter option allows you to automatically recover multiple related databases in a secondary region after a catastrophic failure or other unplanned event that results in full or partial loss of the SQL Database or SQL Managed Instance availability in the primary region. 一个故障转移组可以包含一个或多个数据库,通常由同一个应用程序使用。A failover group can include one or multiple databases, typically used by the same application. 此外,你还可以使用可读辅助数据库卸载只读查询工作负荷。Additionally, you can use the readable secondary databases to offload read-only query workloads. 由于自动故障转移组涉及多个数据库,因此这些数据库必须在主服务器上进行配置。Because auto-failover groups involve multiple databases, these databases must be configured on the primary server. 自动故障转移组支持将组中所有的数据库复制到另一个区域中唯一的辅助服务器或实例。Auto-failover groups support replication of all databases in the group to only one secondary server or instance in a different region.

备注

如果希望多个 Azure SQL 数据库辅助数据库在相同或不同的区域中,请使用活动异地复制If you want multiple Azure SQL Database secondaries in the same or different regions, use active geo-replication.

将自动故障转移组与自动故障转移策略配合使用时,任何影响组中一个或多个数据库的服务中断都会导致自动故障转移。When you are using auto-failover groups with automatic failover policy, any outage that impacts one or several of the databases in the group results in automatic failover. 通常,通过内置的自动高可用性操作无法自行缓解这些事件。Typically these are incidents that cannot be self-mitigated by the built-in automatic high availability operations. 故障转移触发器的示例包括:SQL 数据库租户环或控制环由于多个计算节点上的 OS 内核内存泄漏而导致的事件,或者由于在日常硬件解除期间断开错误的网线而关闭一个或多个租户环,因此导致的事件。The examples of failover triggers include an incident caused by a SQL Database tenant ring or control ring being down due to an OS kernel memory leak on several compute nodes, or an incident caused by one or more tenant rings being down because a wrong network cable was cut during routine hardware decommissioning. 有关详细信息,请参阅 SQL 数据库高可用性For more information, see SQL Database High Availability.

此外,自动故障转移组提供在故障转移期间保持不变的读写和只读侦听器终结点。In addition, auto-failover groups provide read-write and read-only listener end-points that remain unchanged during failovers. 无论使用手动故障转移激活还是自动故障转移激活,故障转移都会将组中所有的辅助数据库切换到主数据库。Whether you use manual or automatic failover activation, failover switches all secondary databases in the group to primary. 数据库故障转移完成后,会自动更新 DNS 记录,以便将终结点重定向到新的区域。After the database failover is completed, the DNS record is automatically updated to redirect the endpoints to the new region. 有关具体的 RPO 和 RTO 数据,请参阅业务连续性概述For the specific RPO and RTO data, see Overview of Business Continuity.

将自动故障转移组与自动故障转移策略配合使用时,任何影响服务器或托管实例中数据库的服务中断都会导致自动故障转移。When you are using auto-failover groups with automatic failover policy, any outage that impacts databases on a server or managed instance results in automatic failover. 可使用以下方式管理自动故障转移组:You can manage auto-failover group using:

故障转移后,请确保在新的主数据库上配置数据库和服务器或实例的身份验证要求。After failover, ensure the authentication requirements for your database and server, or instance are configured on the new primary. 有关详细信息,请参阅 SQL Database security after disaster recovery(灾难恢复后的 Azure SQL 数据库安全性)。For details, see SQL Database security after disaster recovery.

若要真正实现业务连续性,只需添加数据中心之间的数据库冗余即可,这只是该解决方案的一部分功能。To achieve real business continuity, adding database redundancy between datacenters is only part of the solution. 在发生灾难性故障后,端对端地恢复应用程序(服务)需要恢复构成该服务的所有组件以及所有依赖服务。Recovering an application (service) end-to-end after a catastrophic failure requires recovery of all components that constitute the service and any dependent services. 这些组件的示例包括客户端软件(例如,使用自定义 JavaScript 的浏览器)、Web 前端、存储和 DNS。Examples of these components include the client software (for example, a browser with a custom JavaScript), web front ends, storage, and DNS. 所有组件必须能够弹性应对相同的故障,并在应用程序的恢复时间目标 (RTO) 值内变为可用,这一点非常关键。It is critical that all components are resilient to the same failures and become available within the recovery time objective (RTO) of your application. 因此,需要识别所有依赖服务,并了解它们提供的保证和功能。Therefore, you need to identify all dependent services and understand the guarantees and capabilities they provide. 然后,必须执行适当的步骤来确保对用户的服务所依赖的服务执行故障转移期间,用户的服务能够正常运行。Then, you must take adequate steps to ensure that your service functions during the failover of the services on which it depends. 有关设计灾难恢复解决方案的详细信息,请参阅设计使用活动异地复制的灾难恢复云解决方案For more information about designing solutions for disaster recovery, see Designing Cloud Solutions for Disaster Recovery Using active geo-replication.

术语和功能Terminology and capabilities

  • 故障转移组 (FOG)Failover group (FOG)

    故障转移组是由一个服务器管理或位于一个托管实例中的一组指定数据库,当主要区域的服务中断导致所有或部分主要数据库不可用时,这组数据库可作为单元故障转移到另一区域。A failover group is a named group of databases managed by a single server or within a managed instance that can fail over as a unit to another region in case all or some primary databases become unavailable due to an outage in the primary region. 为 SQL 托管实例创建故障转移组后,该组将包含该实例中的所有用户数据库,因此,只能在一个实例上配置一个故障转移组。When it's created for SQL Managed Instance, a failover group contains all user databases in the instance and therefore only one failover group can be configured on an instance.

    重要

    故障转移组的名称在 .database.chinacloudapi.cn 域中必须全局唯一。The name of the failover group must be globally unique within the .database.chinacloudapi.cn domain.

  • 服务器Servers

    使用服务器可将服务器上的部分或所有用户数据库放入故障转移组。With servers, some or all of the user databases on a server can be placed in a failover group. 此外,服务器支持单个服务器上的多个故障转移组。Also, a server supports multiple failover groups on a single server.

  • 主要节点Primary

    托管故障转移组中的主数据库的服务器或托管实例。The server or managed instance that hosts the primary databases in the failover group.

  • 辅助节点Secondary

    托管故障转移组中的辅助数据库的服务器或托管实例。The server or managed instance that hosts the secondary databases in the failover group. 辅助节点不能与主要节点位于相同的区域。The secondary cannot be in the same region as the primary.

  • 将单一数据库添加到故障转移组Adding single databases to failover group

    可以将同一服务器上的多个单一数据库放入同一故障转移组。You can put several single databases on the same server into the same failover group. 如果将单一数据库添加到故障转移组,则它会在辅助服务器上自动使用相同的版本和计算大小创建辅助数据库。If you add a single database to the failover group, it automatically creates a secondary database using the same edition and compute size on secondary server. 创建故障转移组时指定该服务器。You specified that server when the failover group was created. 如果在辅助服务器中添加已具有辅助数据库的数据库,则该异地复制链接由组继承。If you add a database that already has a secondary database in the secondary server, that geo-replication link is inherited by the group. 在不属于故障转移组的服务器中添加已有辅助数据库的数据库时,会在辅助服务器中创建新的辅助节点。When you add a database that already has a secondary database in a server that is not part of the failover group, a new secondary is created in the secondary server.

    重要

    确保辅助服务器没有使用同一名称的数据库,除非它是现有的辅助数据库。Make sure that the secondary server doesn't have a database with the same name unless it is an existing secondary database. 在 SQL 托管实例的故障转移组中,将复制所有用户数据库。In failover groups for SQL Managed Instance, all user databases are replicated. 无法选择复制故障转移组中的一部分用户数据库。You cannot pick a subset of user databases for replication in the failover group.

  • 将弹性池中的数据库添加到故障转移组Adding databases in elastic pool to failover group

    可将一个弹性池内的所有或多个数据库放入同一故障转移组。You can put all or several databases within an elastic pool into the same failover group. 如果主数据库在弹性池中,将在具有相同名称的弹性池(辅助池)中自动创建辅助数据库。If the primary database is in an elastic pool, the secondary is automatically created in the elastic pool with the same name (secondary pool). 必须确保辅助服务器包含名称完全相同的弹性池,并有足够的可用容量来托管将由故障转移组创建的辅助数据库。You must ensure that the secondary server contains an elastic pool with the same exact name and enough free capacity to host the secondary databases that will be created by the failover group. 如果在辅助池中已有辅助数据库的池中添加数据库,则该异地复制链接由组继承。If you add a database in the pool that already has a secondary database in the secondary pool, that geo-replication link is inherited by the group. 在不属于故障转移组的服务器中添加已有辅助数据库的数据库时,会在辅助池中创建新的辅助数据库。When you add a database that already has a secondary database in a server that is not part of the failover group, a new secondary is created in the secondary pool.

  • 初始种子设定Initial Seeding

    将数据库、弹性池或托管实例添加到故障转移组时,在数据复制开始之前,会有一个初始种子设定阶段。When adding databases, elastic pools, or managed instances to a failover group, there is an initial seeding phase before data replication starts. 初始种子设定阶段的操作耗时最长且开销最大。The initial seeding phase is the longest and most expensive operation. 初始种子设定完成后,数据将会同步,此后只会复制后续的数据更改。Once initial seeding completes, data is synchronized, and then only subsequent data changes are replicated. 完成初始种子设定所需的时间取决于数据大小、复制数据库的数量,以及故障转移组中的实体之间的链接速度。The time it takes for the initial seed to complete depends on the size of your data, number of replicated databases, and the speed of the link between the entities in the failover group. 正常情况下,对于 SQL 数据库,典型的种子设定速度为 50-500 GB 每小时;对于 SQL 托管实例,速度为 18-35 GB 每小时。Under normal circumstances, typical seeding speed is 50-500 GB an hour for SQL Database, and 18-35 GB an hour for a SQL Managed Instance. 种子设定将对所有数据库并行执行。Seeding is performed for all databases in parallel. 可以根据所述种子设定速度以及数据库数量和数据的总大小,来估算在数据复制开始之前初始种子设定阶段花费的时间。You can use the stated seeding speed, along with the number of databases and the total size of data to estimate how long the initial seeding phase will take before data replication starts.

    对于 SQL 托管实例,在估算初始种子设定阶段的时间时,还需要考虑到两个实例之间的 Express Route 链接速度。For SQL Managed Instance, the speed of the Express Route link between the two instances also needs to be considered when estimating the time of the initial seeding phase. 如果两个实例之间的链接速度比所需速度要慢,则种子设定所需的时间可能会受到较大影响。If the speed of the link between the two instances is slower than what is necessary, the time to seed is likely be notably impacted. 可以根据所述种子设定速度、数据库数量、数据总大小和链接速度,来估算在数据复制开始之前初始种子设定阶段花费的时间。You can use the stated seeding speed, number of databases, total size of data, and the link speed to estimate how long the initial seeding phase will take before data replication starts. 例如,对于单个 100 GB 数据库,如果链路每小时能够推送 35 GB 数据,则初始种子设定阶段需要花费 2.8 - 5.5 小时。For example, for a single 100 GB database, the initial seed phase would take anywhere from 2.8 - 5.5 hours if the link is capable of pushing 35 GB per hour. 如果链路每小时只能传输 10 GB,则为 100 GB 数据库设定种子需要大约 10 小时。If the link can only transfer 10 GB per hour, then seeding a 100 GB database will take about 10 hours. 如果有多个数据库要复制,则种子设定将会并行执行,在链接速度较慢的情况下,初始种子设定阶段可能需要相当长的时间,尤其是为所有数据库中的数据并行设定种子超过可用的链接带宽时。If there are multiple databases to replicate, seeding will be executed in parallel, and, when combined with a slow link speed, the initial seeding phase may take considerably longer, especially if the parallel seeding of data from all databases exceeds the available link bandwidth. 如果两个实例之间的网络带宽受限制,而你要将多个托管实例添加到故障转移组,请考虑按顺序逐个地将多个托管实例添加到故障转移组。If the network bandwidth between two instances is limited and you are adding multiple managed instances to a failover group, consider adding multiple managed instances to the failover group sequentially, one by one.

  • DNS 区域DNS zone

    创建新 SQL 托管实例时自动生成的唯一 ID。A unique ID that is automatically generated when a new SQL Managed Instance is created. 将为此实例预配一个多域 (SAN) 证书,以便对与同一 DNS 区域中的任何实例建立的客户端连接进行身份验证。A multi-domain (SAN) certificate for this instance is provisioned to authenticate the client connections to any instance in the same DNS zone. 同一故障转移组中的两个托管实例必须共享 DNS 区域。The two managed instances in the same failover group must share the DNS zone.

    备注

    为 SQL 数据库创建的故障转移组不需要 DNS 区域 ID。A DNS zone ID is not required for failover groups created for SQL Database.

  • 故障转移组读写侦听器Failover group read-write listener

    一个 DNS CNAME 记录,指向当前主要节点的 URL。A DNS CNAME record that points to the current primary's URL. 此记录是创建故障转移组时自动创建的,可让读写工作负载在故障转移发生后主节点发生更改时,以透明方式重新连接到主数据库。It is created automatically when the failover group is created and allows the read-write workload to transparently reconnect to the primary database when the primary changes after failover. 在服务器上创建故障转移组时,侦听器 URL 的 DNS CNAME 记录格式为 <fog-name>.database.chinacloudapi.cnWhen the failover group is created on a server, the DNS CNAME record for the listener URL is formed as <fog-name>.database.chinacloudapi.cn. 在 SQL 托管实例上创建故障转移组时,侦听器 URL 的 DNS CNAME 记录格式为 <fog-name>.<zone_id>.database.chinacloudapi.cnWhen the failover group is created on a SQL Managed Instance, the DNS CNAME record for the listener URL is formed as <fog-name>.<zone_id>.database.chinacloudapi.cn.

  • 故障转移组只读侦听器Failover group read-only listener

    构成的 DNS CNAME 记录,指向只读侦听器,后者指向辅助节点的 URL。A DNS CNAME record formed that points to the read-only listener that points to the secondary's URL. 此记录是创建故障转移组时自动创建的,可让只读 SQL 工作负荷使用指定的负载均衡规则以透明方式连接到辅助数据库。It is created automatically when the failover group is created and allows the read-only SQL workload to transparently connect to the secondary using the specified load-balancing rules. 在服务器上创建故障转移组时,侦听器 URL 的 DNS CNAME 记录格式为 <fog-name>.secondary.database.chinacloudapi.cnWhen the failover group is created on a server, the DNS CNAME record for the listener URL is formed as <fog-name>.secondary.database.chinacloudapi.cn. 在 SQL 托管实例上创建故障转移组时,侦听器 URL 的 DNS CNAME 记录格式为 <fog-name>.secondary.<zone_id>.database.chinacloudapi.cnWhen the failover group is created on a SQL Managed Instance, the DNS CNAME record for the listener URL is formed as <fog-name>.secondary.<zone_id>.database.chinacloudapi.cn.

  • 自动故障转移策略Automatic failover policy

    默认使用自动故障转移策略配置故障转移组。By default, a failover group is configured with an automatic failover policy. 检测到故障并且宽限期到期后,Azure 会触发故障转移。Azure triggers failover after the failure is detected and the grace period has expired. 系统必须确保,因影响范围太大,内置高可用性基础结构无法缓解服务中断。The system must verify that the outage cannot be mitigated by the built-in high availability infrastructure due to the scale of the impact. 如果要从应用程序控制故障转移工作流,可以关闭自动故障转移。If you want to control the failover workflow from the application, you can turn off automatic failover.

    备注

    由于验证中断规模及其缓解速度涉及到运营团队的人工措施,因此不能将宽限期设置为一小时以下。Because verification of the scale of the outage and how quickly it can be mitigated involves human actions by the operations team, the grace period cannot be set below one hour. 此限制适用于故障转移组中的所有数据库,不管其数据同步状态如何。This limitation applies to all databases in the failover group regardless of their data synchronization state.

  • 只读故障转移策略Read-only failover policy

    默认禁用只读侦听器的故障转移功能。By default, the failover of the read-only listener is disabled. 这可确保在辅助数据库脱机时,主数据库的性能不会受到影响。It ensures that the performance of the primary is not impacted when the secondary is offline. 但是,这也意味辅助数据库恢复前,只读会话将无法连接。However, it also means the read-only sessions will not be able to connect until the secondary is recovered. 如果不能容忍只读会话停机,但能容忍以主数据库的潜在性能降级为代价将主数据库临时用于只读和读写流量,则可以通过配置 AllowReadOnlyFailoverToPrimary 属性为只读侦听器启用故障转移。If you cannot tolerate downtime for the read-only sessions and are OK to temporarily use the primary for both read-only and read-write traffic at the expense of the potential performance degradation of the primary, you can enable failover for the read-only listener by configuring the AllowReadOnlyFailoverToPrimary property. 在这种情况下,如果辅助节点不可用,则会将只读流量自动重定向到主要节点。In that case, the read-only traffic will be automatically redirected to the primary if the secondary is not available.

  • 计划的故障转移Planned failover

    将辅助角色切换为主要角色之前,计划内故障转移在主要数据库与辅助数据库之间执行完全同步。Planned failover performs full synchronization between primary and secondary databases before the secondary switches to the primary role. 这可以保证数据不会丢失。This guarantees no data loss. 计划内故障转移用于以下场景:Planned failover is used in the following scenarios:

    • 不可接受数据丢失时在生产环境中执行灾难恢复 (DR) 演练Perform disaster recovery (DR) drills in production when the data loss is not acceptable
    • 将数据库重新定位到不同的区域Relocate the databases to a different region
    • 缓解服务中断(故障回复)后将数据库恢复到主要区域。Return the databases to the primary region after the outage has been mitigated (failback).
  • 未计划的故障转移Unplanned failover

    计划外故障转移或强制故障转移立即将辅助角色切换为主要角色,而不与主要节点进行任何同步。Unplanned or forced failover immediately switches the secondary to the primary role without any synchronization with the primary. 此操作会导致数据丢失。This operation will result in data loss. 在服务中断期间当主要节点不可访问时,计划外故障转移将用作恢复方法。Unplanned failover is used as a recovery method during outages when the primary is not accessible. 原始主要节点重新联机后,将在不进行同步的情况下自动重新连接,并成为新的辅助节点。When the original primary is back online, it will automatically reconnect without synchronization and become a new secondary.

  • 手动故障转移Manual failover

    可随时手动启动故障转移,而不考虑自动故障转移配置。You can initiate failover manually at any time regardless of the automatic failover configuration. 如果未配置自动故障转移策略,则需要执行手动故障转移才能将故障转移组中的数据库恢复到辅助节点。If automatic failover policy is not configured, manual failover is required to recover databases in the failover group to the secondary. 可通过完整数据同步启动强制或友好的故障转移。You can initiate forced or friendly failover (with full data synchronization). 后者可用于将主要节点重新定位到次要区域。The latter could be used to relocate the primary to the secondary region. 故障转移完成后,会自动更新 DNS 记录,以确保与新的主要节点建立连接When failover is completed, the DNS records are automatically updated to ensure connectivity to the new primary

  • 数据丢失宽限期Grace period with data loss

    由于主数据库和辅助数据库是使用异步复制进行同步的,因此故障转移可能会导致数据丢失。Because the primary and secondary databases are synchronized using asynchronous replication, the failover may result in data loss. 可以自定义自动故障转移策略,以便反映应用程序对数据丢失的容错。You can customize the automatic failover policy to reflect your application's tolerance to data loss. 通过配置 GracePeriodWithDataLossHours,可以控制系统启动可能导致数据丢失的故障转移之前的等待时间。By configuring GracePeriodWithDataLossHours, you can control how long the system waits before initiating the failover that is likely to result data loss.

  • 多个故障转移组Multiple failover groups

    可为同一对服务器配置多个故障转移组以控制故障转移规模。You can configure multiple failover groups for the same pair of servers to control the scale of failovers. 每个组均独立进行故障转移。Each group fails over independently. 如果多租户应用程序使用弹性池,则可使用此功能来混合每个池的主数据库和辅助数据库。If your multi-tenant application uses elastic pools, you can use this capability to mix primary and secondary databases in each pool. 采用这种方式可将服务中断的影响范围缩小到一半的租户中。This way you can reduce the impact of an outage to only half of the tenants.

    备注

    SQL 托管实例不支持多个故障转移组。SQL Managed Instance does not support multiple failover groups.

权限Permissions

通过 Azure 基于角色的访问控制 (Azure RBAC) 管理故障转移组的权限。Permissions for a failover group are managed via Azure role-based access control (Azure RBAC). SQL Server 参与者角色拥有管理故障转移组所需的全部权限。The SQL Server Contributor role has all the necessary permissions to manage failover groups.

创建故障转移组Create failover group

若要创建某个故障转移组,需要对主服务器和辅助服务器,以及该故障转移组中的所有数据库拥有 RBAC 写入访问权限。To create a failover group, you need RBAC write access to both the primary and secondary servers, and to all databases in the failover group. 对于 SQL 托管实例,需要对主要和辅助 SQL 托管实例拥有 RBAC 写入访问权限,但对单个数据库的权限无关紧要,因为无法在故障转移组中添加或删除单个 SQL 托管实例数据库。For a SQL Managed Instance, you need RBAC write access to both the primary and secondary SQL Managed Instance, but permissions on individual databases are not relevant, because individual SQL Managed Instance databases cannot be added to or removed from a failover group.

更新故障转移组Update a failover group

若要更新某个故障转移组,需要对该故障转移组,以及当前主服务器或托管实例上的所有数据库拥有 RBAC 写入访问权限。To update a failover group, you need RBAC write access to the failover group, and all databases on the current primary server or managed instance.

对故障转移组进行故障转移Fail over a failover group

若要对某个故障转移组进行故障转移,需要对新的主服务器或托管实例上的故障转移组拥有 RBAC 写入访问权限。To fail over a failover group, you need RBAC write access to the failover group on the new primary server or managed instance.

SQL 数据库的最佳做法Best practices for SQL Database

自动故障转移组必须在主服务器上进行配置,需将其连接到不同 Azure 区域中的辅助服务器。The auto-failover group must be configured on the primary server and will connect it to the secondary server in a different Azure region. 组可以包含这些服务器中的所有或部分数据库。The groups can include all or some databases in these servers. 下图演示了使用多个数据库和自动故障转移组的异地冗余云应用程序的典型配置。The following diagram illustrates a typical configuration of a geo-redundant cloud application using multiple databases and auto-failover group.

自动故障转移

备注

有关将 SQL 数据库中的数据库添加到故障转移组的详细分步教程,请参阅将 SQL 数据库添加到故障转移组See Add SQL Database to a failover group for a detailed step-by-step tutorial adding a database in SQL Database to a failover group.

在设计具有业务连续性的服务时,请遵循以下一般准则:When designing a service with business continuity in mind, follow these general guidelines:

使用一个或多个故障转移组来管理多个数据库的故障转移Using one or several failover groups to manage failover of multiple databases

可在不同区域的两个服务器(主服务器和辅助服务器)之间创建一个或多个故障转移组。One or many failover groups can be created between two servers in different regions (primary and secondary servers). 每组可包含一个或多个数据库,这些数据库是在所有或某些主数据库因主要区域中的服务中断而变得不可用时,作为单元恢复的。Each group can include one or several databases that are recovered as a unit in case all or some primary databases become unavailable due to an outage in the primary region. 故障转移组使用服务目标作为主数据库创建异地辅助数据库。The failover group creates geo-secondary database with the same service objective as the primary. 如果将现有的异地复制关系添加到故障转移组,请确保使用与主数据库相同的服务层级和计算大小来配置异地辅助数据库。If you add an existing geo-replication relationship to the failover group, make sure the geo-secondary is configured with the same service tier and compute size as the primary.

重要

对于 Azure SQL 数据库,目前不支持创建在不同订阅中的两个服务器之间进行故障转移的组。Creating failover groups between two servers in different subscriptions is not currently supported for Azure SQL Database. 如果在故障转移组创建以后将主服务器或辅助服务器移到另一订阅,则可能导致故障转移请求和其他操作失败。If you move the primary or secondary server to a different subscription after the failover group has been created, it could result in failures of the failover requests and other operations.

使用读写侦听器处理 OLTP 工作负荷Using read-write listener for OLTP workload

执行 OLTP 操作时,请使用 <fog-name>.database.chinacloudapi.cn 作为服务器 URL,连接将自动定向到主要节点。When performing OLTP operations, use <fog-name>.database.chinacloudapi.cn as the server URL and the connections are automatically directed to the primary. 此 URL 在故障转移后不会更改。This URL does not change after the failover. 请注意,故障转移涉及更新 DNS 记录,以便仅在刷新客户端 DNS 缓存后,客户端连接才会重定向到新的主数据库。Note the failover involves updating the DNS record so the client connections are redirected to the new primary only after the client DNS cache is refreshed.

使用只读侦听器处理只读工作负荷Using read-only listener for read-only workload

如果你有一个在逻辑上隔离的只读工作负荷,且它允许存在一些过时数据,则可在应用程序中使用辅助数据库。If you have a logically isolated read-only workload that is tolerant to certain staleness of data, you can use the secondary database in the application. 对于只读的会话,请使用 <fog-name>.secondary.database.chinacloudapi.cn 作为服务器 URL,连接将自动定向到辅助节点。For read-only sessions, use <fog-name>.secondary.database.chinacloudapi.cn as the server URL and the connection is automatically directed to the secondary. 此外,还建议使用 ApplicationIntent=ReadOnly 在连接字符串中指示读取意向。It is also recommended that you indicate in connection string read intent by using ApplicationIntent=ReadOnly. 如果要确保只读工作负荷在故障转移后或辅助服务器脱机时可以重新连接,请确保配置故障转移策略的 AllowReadOnlyFailoverToPrimary 属性。If you want to ensure that the read-only workload can reconnect after failover or in case the secondary server goes offline, make sure to configure the AllowReadOnlyFailoverToPrimary property of the failover policy.

为性能降低做好准备Preparing for performance degradation

典型的 Azure 应用程序使用多个 Azure 服务,并由多个组件构成。A typical Azure application uses multiple Azure services and consists of multiple components. 故障转移组的自动故障转移是基于 Azure SQL 组件本身的状态触发的。The automated failover of the failover group is triggered based on the state the Azure SQL components alone. 主要区域中的其他 Azure 服务可能不受中断的影响,其组件可能仍在该区域中可用。Other Azure services in the primary region may not be affected by the outage and their components may still be available in that region. 将主数据库切换到灾难恢复区域后,依赖组件之间的延迟可能会增大。Once the primary databases switch to the DR region, the latency between the dependent components may increase. 若要避免较高延迟对应用程序性能造成影响,请确保对灾难恢复区域中的所有应用程序组件采用冗余配置,并遵循以下网络安全指导原则To avoid the impact of higher latency on the application's performance, ensure the redundancy of all the application's components in the DR region and follow these network security guidelines.

为数据丢失做好准备Preparing for data loss

如果检测到服务中断,Azure 会等待 GracePeriodWithDataLossHours 指定的期限。If an outage is detected, Azure waits for the period you specified by GracePeriodWithDataLossHours. 默认值为 1 小时。The default value is 1 hour. 如果不能承受丢失数据,请确保将 GracePeriodWithDataLossHours 设置为一个足够大的数字,如 24 小时。If you cannot afford data loss, make sure to set GracePeriodWithDataLossHours to a sufficiently large number, such as 24 hours. 使用手动组故障转移从辅助节点故障回复到主要节点。Use manual group failover to fail back from the secondary to the primary.

重要

DTU 少于或等于 800、使用异地复制的数据库超过 250 个的弹性数据库池可能会遇到更长的计划故障转移和性能下降等问题。Elastic pools with 800 or fewer DTUs and more than 250 databases using geo-replication may encounter issues including longer planned failovers and degraded performance. 这些问题更可能在写密集型工作负荷下发生,例如,异地复制终结点广泛分隔于各个地理位置,或者每个数据库使用多个辅助终结点。These issues are more likely to occur for write intensive workloads, when geo-replication endpoints are widely separated by geography, or when multiple secondary endpoints are used for each database. 当异地复制滞后随着时间推移增加时,这些问题的症状便会显现。Symptoms of these issues are indicated when the geo-replication lag increases over time. 这种滞后可以使用 sys.dm_geo_replication_link_status 进行监视。This lag can be monitored using sys.dm_geo_replication_link_status. 如果发生这些问题,缓解方法包括增加池 DTU 的数量或者减少同一池中异地复制数据库的数量。If these issues occur, then mitigations include increasing the number of pool DTUs, or reducing the number of geo-replicated databases in the same pool.

更改故障转移组的次要区域Changing secondary region of the failover group

为了演示更改顺序,我们假设服务器 A 是主服务器,服务器 B 是现有的辅助服务器,服务器 C 是第三个区域中的新辅助服务器。To illustrate the change sequence, we will assume that server A is the primary server, server B is the existing secondary server, and server C is the new secondary in the third region. 若要进行转换,请执行以下步骤:To make the transition, follow these steps:

  1. 使用活动异地复制,在服务器 C 中为服务器 A 上的每个数据库创建额外的辅助数据库。Create additional secondaries of each database on server A to server C using active geo-replication. 服务器 A 上的每个数据库具有两个辅助数据库,其中一个位于服务器 B 上,另一个位于服务器 C 上。这可以保证主数据库在转换过程中仍受保护。Each database on server A will have two secondaries, one on server B and one on server C. This will guarantee that the primary databases remain protected during the transition.
  2. 删除故障转移组。Delete the failover group. 此时,登录将会失败。At this point the logins will be failing. 这是因为,故障转移组侦听器的 SQL 别名已删除,因此网关无法识别故障转移组名称。This is because the SQL aliases for the failover group listeners have been deleted and the gateway will not recognize the failover group name.
  3. 在服务器 A 与 C 之间重新创建同名的故障转移组。此时,登录将不再失败。Re-create the failover group with the same name between servers A and C. At this point the logins will stop failing.
  4. 将服务器 A 上的所有主数据库添加到新的故障转移组。Add all primary databases on server A to the new failover group.
  5. 删除服务器 B。服务器 B 上的所有数据库将自动删除。Drop server B. All databases on B will be deleted automatically.

更改故障转移组的主要区域Changing primary region of the failover group

为了演示更改顺序,我们假设服务器 A 是主服务器,服务器 B 是现有的辅助服务器,服务器 C 是第三个区域中的新主服务器。To illustrate the change sequence, we will assume server A is the primary server, server B is the existing secondary server, and server C is the new primary in the third region. 若要进行转换,请执行以下步骤:To make the transition, follow these steps:

  1. 执行计划性故障转移,将主服务器切换到 B。服务器 A 将成为新的辅助服务器。Perform a planned failover to switch the primary server to B. Server A will become the new secondary server. 故障转移可能会导致几分钟的停机。The failover may result in several minutes of downtime. 实际时间取决于故障转移组的大小。The actual time will depend on the size of failover group.
  2. 使用活动异地复制,在服务器 C 中为服务器 B 上的每个数据库创建额外的辅助数据库。Create additional secondaries of each database on server B to server C using active geo-replication. 服务器 B 上的每个数据库具有两个辅助数据库,其中一个位于服务器 A 上,另一个位于服务器 C 上。这可以保证主数据库在转换过程中仍受保护。Each database on server B will have two secondaries, one on server A and one on server C. This will guarantee that the primary databases remain protected during the transition.
  3. 删除故障转移组。Delete the failover group. 此时,登录将会失败。At this point the logins will be failing. 这是因为,故障转移组侦听器的 SQL 别名已删除,因此网关无法识别故障转移组名称。This is because the SQL aliases for the failover group listeners have been deleted and the gateway will not recognize the failover group name.
  4. 在服务器 B 与 C 之间重新创建同名的故障转移组。此时,登录将不再失败。Re-create the failover group with the same name between servers B and C. At this point the logins will stop failing.
  5. 将服务器 B 上的所有主数据库添加到新的故障转移组。Add all primary databases on B to the new failover group.
  6. 执行故障转移组的计划性故障转移来切换 B 和 C。现在,服务器 C 将成为主服务器,B 将成为辅助服务器。Perform a planned failover of the failover group to switch B and C. Now server C will become the primary and B - the secondary. 服务器 A 上的所有辅助数据库将自动链接到 C 上的主数据库。如步骤 1 中所述,故障转移可能会导致几分钟的停机。All secondary databases on server A will be automatically linked to the primaries on C. As in step 1, the failover may result in several minutes of downtime.
  7. 删除服务器 A。服务器 A 上的所有数据库将自动删除。Drop the server A. All databases on A will be deleted automatically.

重要

删除故障转移组时,也会删除侦听器终结点的 DNS 记录。When the failover group is deleted, the DNS records for the listener endpoints are also deleted. 此时,其他某人能够创建同名故障转移组或服务器别名的概率不为零,这会阻止你再次使用此故障转移组或服务器别名。At that point, there is a non-zero probability of somebody else creating a failover group or server alias with the same name, which will prevent you from using it again. 若要将风险降到最低,请不要使用常用的故障转移组名称。To minimize the risk, don't use generic failover group names.

SQL 托管实例的最佳做法Best practices for SQL Managed Instance

自动故障转移组必须在主要实例上进行配置,需将其连接到不同 Azure 区域中的辅助实例。The auto-failover group must be configured on the primary instance and will connect it to the secondary instance in a different Azure region. 实例中的所有数据库将复制到辅助实例。All databases in the instance will be replicated to the secondary instance.

下图演示了使用托管实例和自动故障转移组的异地冗余云应用程序的典型配置。The following diagram illustrates a typical configuration of a geo-redundant cloud application using managed instance and auto-failover group.

自动故障转移

备注

有关添加 SQL 托管实例以使用故障转移组的详细分步教程,请参阅将托管实例添加到故障转移组See Add managed instance to a failover group for a detailed step-by-step tutorial adding a SQL Managed Instance to use failover group.

如果应用程序使用 SQL 托管实例作为数据层,进行业务连续性设计时,请遵循以下一般准则:If your application uses SQL Managed Instance as the data tier, follow these general guidelines when designing for business continuity:

创建辅助实例Creating the secondary instance

若要确保故障转移后与主要 SQL 托管实例的连接不中断,主要实例和辅助实例必须位于同一 DNS 区域。To ensure non-interrupted connectivity to the primary SQL Managed Instance after failover both the primary and secondary instances must be in the same DNS zone. 将会保证同一个多域 (SAN) 证书可用于对与故障转移组中的两个实例之一建立的客户端连接进行身份验证。It will guarantee that the same multi-domain (SAN) certificate can be used to authenticate the client connections to either of the two instances in the failover group. 准备好将应用程序部署到生产环境后,在不同的区域中创建一个辅助 SQL 托管实例,并确保它与主要 SQL 托管实例共享 DNS 区域。When your application is ready for production deployment, create a secondary SQL Managed Instance in a different region and make sure it shares the DNS zone with the primary SQL Managed Instance. 为此,可以使用 Azure 门户、PowerShell 或 REST API 指定可选 DNS Zone Partner 参数。You can do it by specifying the optional DNS Zone Partner parameter using the Azure portal, PowerShell, or the REST API.

重要

在子网中创建的第一个托管实例确定同一子网中所有后续实例的 DNS 区域。The first managed instance created in the subnet determines DNS zone for all subsequent instances in the same subnet. 这意味着,同一子网中的两个实例不能属于不同的 DNS 区域。This means that two instances from the same subnet cannot belong to different DNS zones.

有关在主要实例所在的 DNS 区域中创建辅助 SQL 托管实例的详细信息,请参阅创建辅助托管实例For more information about creating the secondary SQL Managed Instance in the same DNS zone as the primary instance, see Create a secondary managed instance.

在两个实例之间启用复制流量Enabling replication traffic between two instances

由于每个实例隔离在其自身的 VNet 中,因此,必须允许这些 VNet 之间的双向流量。Because each instance is isolated in its own VNet, two-directional traffic between these VNets must be allowed. 请参阅 Azure VPN 网关See Azure VPN gateway

在不同订阅中的托管实例之间创建故障转移组Creating a failover group between managed instances in different subscriptions

可以在两个不同订阅中的 SQL 托管实例之间创建故障转移组,前提是订阅与相同的 Azure Active Directory 租户关联。You can create a failover group between SQL Managed Instances in two different subscriptions, as long as subscriptions are associated to the same Azure Active Directory Tenant. 使用 PowerShell API 时,可以通过为辅助 SQL 托管实例指定 PartnerSubscriptionId 参数来执行此操作。When using PowerShell API, you can do it by specifying the PartnerSubscriptionId parameter for the secondary SQL Managed Instance. 使用 REST API 时,properties.managedInstancePairs 参数中包含的每个实例 ID 都可以有自己的订阅 ID。When using REST API, each instance ID included in the properties.managedInstancePairs parameter can have its own subscriptionID.

重要

Azure 门户不支持创建跨不同订阅的故障转移组。Azure portal does not support the creation of failover groups across different subscriptions. 此外,对于跨不同订阅和/或资源组的现有故障转移组,无法通过门户从主要 SQL 托管实例手动启动故障转移。Also, for the existing failover groups across different subscriptions and/or resource groups, failover cannot be initiated manually via portal from the primary SQL Managed Instance. 改为从异地辅助实例启动它。Initiate it from the geo-secondary instance instead.

管理到辅助实例的故障转移Managing failover to secondary instance

故障转移组将管理 SQL 托管实例中所有数据库的故障转移。The failover group will manage the failover of all the databases in the SQL Managed Instance. 创建某个组后,实例中的每个数据库将自动异地复制到辅助 SQL 托管实例。When a group is created, each database in the instance will be automatically geo-replicated to the secondary SQL Managed Instance. 无法使用故障转移组针对一部分数据库启动部分故障转移。You cannot use failover groups to initiate a partial failover of a subset of the databases.

重要

如果从主要 SQL 托管实例中删除了某个数据库,该数据库也会在异地辅助 SQL 托管实例上自动删除。If a database is removed from the primary SQL Managed Instance, it will also be dropped automatically on the geo-secondary SQL Managed Instance.

使用读写侦听器处理 OLTP 工作负荷Using read-write listener for OLTP workload

执行 OLTP 操作时,请使用 <fog-name>.zone_id.database.chinacloudapi.cn 作为服务器 URL,连接将自动定向到主要节点。When performing OLTP operations, use <fog-name>.zone_id.database.chinacloudapi.cn as the server URL and the connections are automatically directed to the primary. 此 URL 在故障转移后不会更改。This URL does not change after the failover. 故障转移涉及更新 DNS 记录,以便仅在刷新客户端 DNS 缓存后,客户端连接才会重定向到新的主要节点。The failover involves updating the DNS record, so the client connections are redirected to the new primary only after the client DNS cache is refreshed. 由于辅助实例与主要实例共享 DNS 区域,客户端应用程序可以使用相同的 SAN 证书重新连接到辅助实例。Because the secondary instance shares the DNS zone with the primary, the client application will be able to reconnect to it using the same SAN certificate.

使用只读侦听器连接到辅助实例Using read-only listener to connect to the secondary instance

如果你有一个在逻辑上隔离的只读工作负荷,且它允许存在一些过时数据,则可在应用程序中使用辅助数据库。If you have a logically isolated read-only workload that is tolerant to certain staleness of data, you can use the secondary database in the application. 若要直接连接到异地复制的辅助节点,请使用 <fog-name>.secondary.<zone_id>.database.chinacloudapi.cn 作为服务器 URL,这样可以直接连接到异地复制的辅助节点。To connect directly to the geo-replicated secondary, use <fog-name>.secondary.<zone_id>.database.chinacloudapi.cn as the server URL and the connection is made directly to the geo-replicated secondary.

备注

在某些服务层级中,SQL 数据库支持通过只读副本,使用一个只读副本的容量和连接字符串中的 ApplicationIntent=ReadOnly 参数对只读查询工作负载进行负载均衡。In certain service tiers, SQL Database supports the use of read-only replicas to load balance read-only query workloads using the capacity of one read-only replica and using the ApplicationIntent=ReadOnly parameter in the connection string. 如果配置了异地复制的辅助节点,则可以使用此功能连接到主要位置或异地复制位置中的只读副本。When you have configured a geo-replicated secondary, you can use this capability to connect to either a read-only replica in the primary location or in the geo-replicated location.

  • 若要连接到主要位置中的只读副本,请使用 <fog-name>.<zone_id>.database.chinacloudapi.cnTo connect to a read-only replica in the primary location, use <fog-name>.<zone_id>.database.chinacloudapi.cn.
  • 若要连接到辅助位置中的只读副本,请使用 <fog-name>.secondary.<zone_id>.database.chinacloudapi.cnTo connect to a read-only replica in the secondary location, use <fog-name>.secondary.<zone_id>.database.chinacloudapi.cn.

为性能降低做好准备Preparing for performance degradation

典型的 Azure 应用程序使用多个 Azure 服务,并由多个组件构成。A typical Azure application uses multiple Azure services and consists of multiple components. 故障转移组的自动故障转移是基于 Azure SQL 组件本身的状态触发的。The automated failover of the failover group is triggered based on the state the Azure SQL components alone. 主要区域中的其他 Azure 服务可能不受中断的影响,其组件可能仍在该区域中可用。Other Azure services in the primary region may not be affected by the outage and their components may still be available in that region. 将主数据库切换到灾难恢复区域后,依赖组件之间的延迟可能会增大。Once the primary databases switch to the DR region, the latency between the dependent components may increase. 若要避免较高延迟对应用程序性能造成影响,请确保对灾难恢复区域中的所有应用程序组件采用冗余配置,并遵循以下网络安全指导原则To avoid the impact of higher latency on the application's performance, ensure the redundancy of all the application's components in the DR region and follow these network security guidelines.

为数据丢失做好准备Preparing for data loss

如果检测到服务中断,则根据我们所知,如果没有数据丢失,将触发读写故障转移。If an outage is detected, a read-write failover is triggered if there is zero data loss, to the best of our knowledge. 否则,等待指定的期限。Otherwise there is a wait for the period you specified by. 否则,它会等待 GracePeriodWithDataLossHours 指定的期限。Otherwise, it waits for the period you specified by GracePeriodWithDataLossHours. 如果指定了 GracePeriodWithDataLossHours,则可能会丢失数据。If you specified GracePeriodWithDataLossHours, be prepared for data loss. 一般情况下,在中断期间 Azure 倾向于可用性。In general, during outages, Azure favors availability. 如果不能承受丢失数据,请务必将 GracePeriodWithDataLossHours 设置为一个足够大的数字,例如 24 小时。If you cannot afford data loss, make sure to set GracePeriodWithDataLossHours to a sufficiently large number, such as 24 hours.

启动故障转移后,读写侦听器的 DNS 更新会立即发生。The DNS update of the read-write listener will happen immediately after the failover is initiated. 此操作不会导致数据丢失。This operation will not result in data loss. 但是,在正常情况下,切换数据库角色的过程可能需要 5 分钟时间。However, the process of switching database roles can take up to 5 minutes under normal conditions. 在完成之前,新主要实例中的某些数据库仍是只读的。Until it is completed, some databases in the new primary instance will still be read-only. 如果使用 PowerShell 启动故障转移,则整个操作是同步的。If failover is initiated using PowerShell, the entire operation is synchronous. 如果使用 Azure 门户启动故障转移,UI 将指示完成状态。If it is initiated using the Azure portal, the UI will indicate completion status. 如果使用 REST API 启动故障转移,可以使用标准 Azure 资源管理器的轮询机制来监视完成状态。If it is initiated using the REST API, use standard Azure Resource Manager's polling mechanism to monitor for completion.

重要

使用手动组故障转移可将主要数据库移回到原始位置。Use manual group failover to move primaries back to the original location. 缓解导致故障转移的服务中断问题后,可将主要数据库移到原始位置。When the outage that caused the failover is mitigated, you can move your primary databases to the original location. 为此,应该启动组的手动故障转移。To do that you should initiate the manual failover of the group.

更改故障转移组的次要区域Changing secondary region of the failover group

假设实例 A 是主实例,实例 B 是现有的辅助实例,实例 C 是第三个区域中的新辅助实例。Let's assume that instance A is the primary instance, instance B is the existing secondary instance, and instance C is the new secondary instance in the third region. 若要进行转换,请执行以下步骤:To make the transition, follow these steps:

  1. 在同一个 DNS 区域中创建大小与 A 相同的实例 C。Create instance C with same size as A and in the same DNS zone.
  2. 删除实例 A 与 B 之间的故障转移组。此时,登录将会失败,因为故障转移组侦听器的 SQL 别名已删除,因此网关无法识别故障转移组名称。Delete the failover group between instances A and B. At this point the logins will be failing because the SQL aliases for the failover group listeners have been deleted and the gateway will not recognize the failover group name. 辅助数据库将从主实例断开连接,并成为读写数据库。The secondary databases will be disconnected from the primaries and will become read-write databases.
  3. 在实例 A 与 C 之间创建同名的故障转移组。按照包含 SQL 托管实例的故障转移组教程中的说明操作。Create a failover group with the same name between instance A and C. Follow the instructions in failover group with SQL Managed Instance tutorial. 这是一个与数据大小相关的操作,实例 A 中的所有数据库都已设定种子并同步后,此操作将会完成。This is a size-of-data operation and will complete when all databases from instance A are seeded and synchronized.
  4. 如果不需要实例 B,请将其删除,以免产生不必要的费用。Delete instance B if not needed to avoid unnecessary charges.

备注

完成步骤 2 到 3 后,如果实例 A 发生灾难性故障,其中的数据库将仍不受保护。After step 2 and until step 3 is completed the databases in instance A will remain unprotected from a catastrophic failure of instance A.

更改故障转移组的主要区域Changing primary region of the failover group

假设实例 A 是主实例,实例 B 是现有的辅助实例,实例 C 是第三个区域中的新主实例。Let's assume instance A is the primary instance, instance B is the existing secondary instance, and instance C is the new primary instance in the third region. 若要进行转换,请执行以下步骤:To make the transition, follow these steps:

  1. 在同一个 DNS 区域中创建大小与 B 相同的实例 C。Create instance C with same size as B and in the same DNS zone.
  2. 连接到实例 B,并手动故障转移以将主实例切换到 B。实例 A 将自动成为新的辅助实例。Connect to instance B and manually failover to switch the primary instance to B. Instance A will become the new secondary instance automatically.
  3. 删除实例 A 与 B 之间的故障转移组。此时,登录将会失败,因为故障转移组侦听器的 SQL 别名已删除,因此网关无法识别故障转移组名称。Delete the failover group between instances A and B. At this point the logins will be failing because the SQL aliases for the failover group listeners have been deleted and the gateway will not recognize the failover group name. 辅助数据库将从主实例断开连接,并成为读写数据库。The secondary databases will be disconnected from the primaries and will become read-write databases.
  4. 在实例 A 与 C 之间创建同名的故障转移组。按照包含托管实例的故障转移组教程中的说明操作。Create a failover group with the same name between instance A and C. Follow the instructions in the failover group with managed instance tutorial. 这是一个与数据大小相关的操作,实例 A 中的所有数据库都已设定种子并同步后,此操作将会完成。This is a size-of-data operation and will complete when all databases from instance A are seeded and synchronized.
  5. 如果不需要实例 A,请将其删除,以免产生不必要的费用。Delete instance A if not needed to avoid unnecessary charges.

注意

完成步骤 3 到 4 后,如果实例 A 发生灾难性故障,其中的数据库将仍不受保护。After step 3 and until step 4 is completed the databases in instance A will remain unprotected from a catastrophic failure of instance A.

重要

删除故障转移组时,也会删除侦听器终结点的 DNS 记录。When the failover group is deleted, the DNS records for the listener endpoints are also deleted. 此时,其他某人能够创建同名故障转移组或服务器别名的概率不为零,这会阻止你再次使用此故障转移组或服务器别名。At that point, there is a non-zero probability of somebody else creating a failover group or server alias with the same name, which will prevent you from using it again. 若要将风险降到最低,请不要使用常用的故障转移组名称。To minimize the risk, don't use generic failover group names.

启用依赖于系统数据库中的对象的方案Enable scenarios dependent on objects from the system databases

系统数据库不会复制到故障转移组中的辅助实例。System databases are not replicated to the secondary instance in a failover group. 若要启用依赖于系统数据库中的对象的方案,请确保在辅助实例上创建相同的对象。To enable scenarios that depend on objects from the system databases, on the secondary instance, make sure to create the same objects on the secondary. 例如,如果你计划在辅助实例上使用相同的登录名,请确保使用相同的 SID 创建它们。For example, if you plan to use the same logins on the secondary instance, make sure to create them with the identical SID.

-- Code to create login on the secondary instance
CREATE LOGIN foo WITH PASSWORD = '<enterStrongPasswordHere>', SID = <login_sid>;

故障转移组和网络安全Failover groups and network security

对于某些应用程序,安全规则要求只允许特定组件(如 VM、Web 服务等)通过网络访问数据层。此要求对业务连续性设计和故障转移组的使用提出了一些挑战。For some applications the security rules require that the network access to the data tier is restricted to a specific component or components such as a VM, web service etc. This requirement presents some challenges for business continuity design and the use of the failover groups. 在实施此类受限访问时,请考虑以下选项。Consider the following options when implementing such restricted access.

使用故障转移组和虚拟网络规则Using failover groups and virtual network rules

如果使用虚拟网络服务终结点和规则来限制对 SQL 数据库或 SQL 托管实例中的数据库的访问,请注意每个虚拟网络服务终结点仅适用于一个 Azure 区域。If you are using Virtual Network service endpoints and rules to restrict access to your database in SQL Database or SQL Managed Instance, be aware that each virtual network service endpoint applies to only one Azure region. 终结点不允许其他区域接受来自该子网的通信。The endpoint does not enable other regions to accept communication from the subnet. 因此,只有部署在同一区域中的客户端应用程序才能连接到主数据库。Therefore, only the client applications deployed in the same region can connect to the primary database. 因为故障转移会导致 SQL 数据库客户端会话重新路由到不同(次要)区域中的服务器,所以源自该区域之外的客户端的这些会话将失败。Since the failover results in the SQL Database client sessions being rerouted to a server in a different (secondary) region, these sessions will fail if originated from a client outside of that region. 因此,如果参与的服务器或实例包含在虚拟网络规则中,则无法启用自动故障转移策略。For that reason, the automatic failover policy cannot be enabled if the participating servers or instances are included in the Virtual Network rules. 若要支持手动故障转移,请执行以下步骤:To support manual failover, follow these steps:

  1. 在次要区域中预配应用程序前端组件(Web 服务、虚拟机等)的冗余副本Provision the redundant copies of the front-end components of your application (web service, virtual machines etc.) in the secondary region
  2. 为主服务器和辅助服务器分别配置虚拟网络规则Configure the virtual network rules individually for primary and secondary server
  3. 使用流量管理器配置启用前端故障转移Enable the front-end failover using a Traffic manager configuration
  4. 检测到服务中断时启动手动故障转移。Initiate manual failover when the outage is detected. 此选项针对需要在前端和数据层之间保持一致延迟的应用程序进行了优化,并支持在前端和/或数据层受到服务中断的影响时进行恢复。This option is optimized for the applications that require consistent latency between the front-end and the data tier and supports recovery when either front end, data tier or both are impacted by the outage.

备注

如果使用只读侦听器对只读工作负荷进行负载均衡,请确保在次要区域中的 VM 或其他资源上执行此工作负荷,以便它可以连接到辅助数据库。If you are using the read-only listener to load-balance a read-only workload, make sure that this workload is executed in a VM or other resource in the secondary region so it can connect to the secondary database.

使用故障转移组和防火墙规则Use failover groups and firewall rules

如果业务连续性计划要求使用自动故障转移组进行故障转移,则可以使用传统防火墙规则限制对 SQL 数据库中的数据库的访问。If your business continuity plan requires failover using groups with automatic failover, you can restrict access to your database in SQL Database by using the traditional firewall rules. 若要支持自动故障转移,请执行以下步骤:To support automatic failover, follow these steps:

  1. 创建公共 IPCreate a public IP
  2. 创建公共负载均衡器并为其分配公共 IP。Create a public load balancer and assign the public IP to it.
  3. 为前端组件创建虚拟网络和虚拟机Create a virtual network and the virtual machines for your front-end components
  4. 创建网络安全组并配置入站连接。Create network security group and configure inbound connections.
  5. 使用“Sql”服务标记确保出站连接向 Azure SQL 数据库开放。Ensure that the outbound connections are open to Azure SQL Database by using ‘Sql’ service tag.
  6. 创建 SQL 数据库防火墙规则,以允许来自步骤 1 中创建的公共 IP 地址的入站流量。Create a SQL Database firewall rule to allow inbound traffic from the public IP address you create in step 1.

有关如何配置出站访问以及在防火墙规则中使用哪个 IP 的详细信息,请参阅负载均衡器出站连接For more information on how to configure outbound access and what IP to use in the firewall rules, see Load balancer outbound connections.

上述配置将确保自动故障转移不会阻止来自前端组件的连接,并假定应用程序可以容忍前端与数据层之间的较长延迟。The above configuration will ensure that the automatic failover will not block connections from the front-end components and assumes that the application can tolerate the longer latency between the front end and the data tier.

重要

若要保证区域服务中断的业务连续性,则必须确保前端组件和数据库的地理冗余。To guarantee business continuity for regional outages you must ensure geographic redundancy for both front-end components and the databases.

在托管实例及其 VNet 之间启用异地复制Enabling geo-replication between managed instances and their VNets

在两个不同区域中的主要和辅助 SQL 托管实例之间设置故障转移组时,将使用独立的虚拟网络来隔离每个实例。When you set up a failover group between primary and secondary SQL Managed Instances in two different regions, each instance is isolated using an independent virtual network. 若要允许这些 VNet 之间的复制流量,请确保满足以下先决条件:To allow replication traffic between these VNets ensure these prerequisites are met:

  • SQL 托管实例的两个实例需位于不同的 Azure 区域中。The two instances of SQL Managed Instance need to be in different Azure regions.

  • SQL 托管实例的这两个实例需位于相同的服务层级,并且具有相同的存储大小。The two instances of SQL Managed Instance need to be the same service tier, and have the same storage size.

  • SQL 托管实例的辅助实例必须是空的(不包含任何用户数据库)。Your secondary instance of SQL Managed Instance must be empty (no user databases).

  • 需要通过 VPN 网关Express Route 来连接 SQL 托管实例的实例使用的虚拟网络。The virtual networks used by the instances of SQL Managed Instance need to be connected through a VPN Gateway or Express Route. 当两个虚拟网络通过本地网络连接时,请确保没有任何防火墙规则阻止端口 5022 和 11000-11999。When two virtual networks connect through an on-premises network, ensure there is no firewall rule blocking ports 5022, and 11000-11999. 不支持全局 VNet 对等互连。Global VNet Peering is not supported.

  • 两个 SQL 托管实例 VNet 的 IP 地址不能重叠。The two SQL Managed Instance VNets cannot have overlapping IP addresses.

  • 需要设置网络安全组 (NSG),使端口 5022 和端口范围 11000~12000 保持打开,以便能够从其他托管实例的子网建立入站和出站连接。You need to set up your Network Security Groups (NSG) such that ports 5022 and the range 11000~12000 are open inbound and outbound for connections from the subnet of the other managed instance. 目的是允许实例之间的复制流量。This is to allow replication traffic between the instances.

    重要

    NSG 安全规则配置不当会导致数据库复制操作停滞。Misconfigured NSG security rules leads to stuck database copy operations.

  • 辅助 SQL 托管实例上已配置正确的 DNS 区域 ID。The secondary SQL Managed Instance is configured with the correct DNS zone ID. DNS 区域是 SQL 托管实例和基础虚拟群集的属性,其 ID 包含在主机名地址中。DNS zone is a property of a SQL Managed Instance and underlying virtual cluster, and its ID is included in the host name address. 在每个 VNet 中创建第一个 SQL 托管实例时,将生成随机字符串形式的区域 ID。同一个 ID 将分配到同一子网中的所有其他实例。The zone ID is generated as a random string when the first SQL Managed Instance is created in each VNet and the same ID is assigned to all other instances in the same subnet. 分配后,无法修改 DNS 区域。Once assigned, the DNS zone cannot be modified. 同一故障转移组中包含的 SQL 托管实例必须共享 DNS 区域。SQL Managed Instances included in the same failover group must share the DNS zone. 为此,在创建辅助实例时,可以传递主要实例的区域 ID 作为 DnsZonePartner 参数的值。You accomplish this by passing the primary instance's zone ID as the value of DnsZonePartner parameter when creating the secondary instance.

    备注

    有关使用 SQL 托管实例配置故障转移组的详细教程,请参阅将 SQL 托管实例添加到故障转移组For a detailed tutorial on configuring failover groups with SQL Managed Instance, see add a SQL Managed Instance to a failover group.

升级或降级主数据库Upgrading or downgrading a primary database

无需断开连接任何辅助数据库,即可将主数据库升级或降级到不同的计算大小(在相同的服务层级中,但不在“常规用途”与“业务关键”类型之间)。You can upgrade or downgrade a primary database to a different compute size (within the same service tier, not between General Purpose and Business Critical) without disconnecting any secondary databases. 升级时,建议先升级所有辅助数据库,再升级主数据库。When upgrading, we recommend that you upgrade all of the secondary databases first, and then upgrade the primary. 降级时,请反转顺序:先降级主数据库,再降级所有辅助数据库。When downgrading, reverse the order: downgrade the primary first, and then downgrade all of the secondary databases. 将数据库升级或降级到不同服务层级时,将强制执行此建议操作。When you upgrade or downgrade the database to a different service tier, this recommendation is enforced.

具体而言,建议采用此顺序的目的是避免较低 SKU 上的辅助数据库在过载时出现问题,并且必须在升级或降级过程中重新设定种子。This sequence is recommended specifically to avoid the problem where the secondary at a lower SKU gets overloaded and must be re-seeded during an upgrade or downgrade process. 此外,可以通过将主数据库设为只读来避免问题,代价是针对主数据库的所有读写工作负荷会受到影响。You could also avoid the problem by making the primary read-only, at the expense of impacting all read-write workloads against the primary.

备注

如果你创建了一个辅助数据库作为故障转移组配置的一部分,则我们不建议对辅助数据库进行降级。If you created a secondary database as part of the failover group configuration it is not recommended to downgrade the secondary database. 这是为了确保激活故障转移后,数据层有足够的容量来处理常规工作负荷。This is to ensure your data tier has sufficient capacity to process your regular workload after failover is activated.

防止丢失关键数据Preventing the loss of critical data

由于广域网的延迟时间较长,连续复制使用了异步复制机制。Due to the high latency of wide area networks, continuous copy uses an asynchronous replication mechanism. 在发生故障时,异步复制会不可避免地丢失某些数据。Asynchronous replication makes some data loss unavoidable if a failure occurs. 但是,某些应用程序可能要求不能有数据丢失。However, some applications may require no data loss. 为了保护这些关键更新,应用程序开发人员可以在提交事务后立即调用 sp_wait_for_database_copy_sync 系统过程。To protect these critical updates, an application developer can call the sp_wait_for_database_copy_sync system procedure immediately after committing the transaction. 调用 sp_wait_for_database_copy_sync 会阻止调用线程,直到将上次提交的事务传输到辅助数据库。Calling sp_wait_for_database_copy_sync blocks the calling thread until the last committed transaction has been transmitted to the secondary database. 但是,它不会等待传输的事务提交到辅助数据库进行重播。However, it does not wait for the transmitted transactions to be replayed and committed on the secondary. sp_wait_for_database_copy_sync 的范围限定为特定的连续复制链接。sp_wait_for_database_copy_sync is scoped to a specific continuous copy link. 对主数据库具有连接权限的任何用户都可以调用此过程。Any user with the connection rights to the primary database can call this procedure.

备注

sp_wait_for_database_copy_sync 可防止故障转移后的数据丢失,但不能保证读取访问完全同步。sp_wait_for_database_copy_sync prevents data loss after failover, but does not guarantee full synchronization for read access. sp_wait_for_database_copy_sync 过程调用导致的延迟可能会很明显,具体取决于调用时的事务日志大小。The delay caused by a sp_wait_for_database_copy_sync procedure call can be significant and depends on the size of the transaction log at the time of the call.

故障转移组和时间点还原Failover groups and point-in-time restore

有关将时间点还原与故障转移组配合使用的信息,请参阅时间点恢复 (PITR)For information about using point-in-time restore with failover groups, see Point in Time Recovery (PITR).

故障转移组的限制Limitations of failover groups

注意以下限制:Be aware of the following limitations:

  • 无法在同一 Azure 区域中的两个服务器或实例之间创建故障转移组。Failover groups cannot be created between two servers or instances in the same Azure regions.
  • 无法重命名故障转移组。Failover groups cannot be renamed. 需要删除该组,并使用不同的名称重新创建它。You will need to delete the group and re-create it with a different name.
  • 故障转移组中的实例不支持数据库重命名。Database rename is not supported for instances in failover group. 需要临时删除故障转移组,才能重命名数据库。You will need to temporarily delete failover group to be able to rename a database.
  • 系统数据库不会复制到故障转移组中的辅助实例。System databases are not replicated to the secondary instance in a failover group. 因此,除非在辅助实例上手动创建对象,否则依赖于系统数据库中的对象的方案将不可能在辅助实例上出现。Therefore, scenarios that depend on objects from the system databases will be impossible on the secondary instance unless the objects are manually created on the secondary.

以编程方式管理故障转移组Programmatically managing failover groups

如上所述,也可以使用 Azure PowerShell 和 REST API 以编程方式管理自动故障转移组和活动异地复制。As discussed previously, auto-failover groups and active geo-replication can also be managed programmatically using Azure PowerShell and the REST API. 下表描述了可用的命令集。The following tables describe the set of commands available. 活动异地复制包括一组用于管理的 Azure 资源管理器 API,其中包括 Azure SQL 数据库 REST APIAzure PowerShell cmdletActive geo-replication includes a set of Azure Resource Manager APIs for management, including the Azure SQL Database REST API and Azure PowerShell cmdlets. 这些 API 需要使用资源组,并支持基于角色的安全性 (RBAC)。These APIs require the use of resource groups and support role-based security (RBAC). 有关如何实现访问角色的详细信息,请参阅 Azure 基于角色的访问控制 (Azure RBAC)For more information on how to implement access roles, see Azure role-based access control (Azure RBAC).

管理 SQL 数据库故障转移Manage SQL Database failover

CmdletCmdlet 说明Description
New-AzSqlDatabaseFailoverGroupNew-AzSqlDatabaseFailoverGroup 此命令会创建故障转移组,并将其同时注册到主服务器和辅助服务器This command creates a failover group and registers it on both primary and secondary servers
Remove-AzSqlDatabaseFailoverGroupRemove-AzSqlDatabaseFailoverGroup 从服务器中删除故障转移组Removes a failover group from the server
Get-AzSqlDatabaseFailoverGroupGet-AzSqlDatabaseFailoverGroup 检索故障转移组的配置Retrieves a failover group's configuration
Set-AzSqlDatabaseFailoverGroupSet-AzSqlDatabaseFailoverGroup 修改故障转移组的配置Modifies configuration of a failover group
Switch-AzSqlDatabaseFailoverGroupSwitch-AzSqlDatabaseFailoverGroup 触发故障转移组到辅助服务器的故障转移Triggers failover of a failover group to the secondary server
Add-AzSqlDatabaseToFailoverGroupAdd-AzSqlDatabaseToFailoverGroup 将一个或更多个数据库添加到故障转移组Adds one or more databases to a failover group

管理 SQL 托管实例故障转移Manage SQL Managed Instance failover

CmdletCmdlet 说明Description
New-AzSqlDatabaseInstanceFailoverGroupNew-AzSqlDatabaseInstanceFailoverGroup 此命令会创建故障转移组,并将其同时注册到主实例和辅助实例This command creates a failover group and registers it on both primary and secondary instances
Set-AzSqlDatabaseInstanceFailoverGroupSet-AzSqlDatabaseInstanceFailoverGroup 修改故障转移组的配置Modifies configuration of a failover group
Get-AzSqlDatabaseInstanceFailoverGroupGet-AzSqlDatabaseInstanceFailoverGroup 检索故障转移组的配置Retrieves a failover group's configuration
Switch-AzSqlDatabaseInstanceFailoverGroupSwitch-AzSqlDatabaseInstanceFailoverGroup 触发故障转移组到辅助实例的故障转移Triggers failover of a failover group to the secondary instance
Remove-AzSqlDatabaseInstanceFailoverGroupRemove-AzSqlDatabaseInstanceFailoverGroup 删除故障转移组Removes a failover group

后续步骤Next steps