使用 RecoveryManager 类解决分片映射问题Using the RecoveryManager class to fix shard map problems

RecoveryManager 类使 ADO.Net 应用程序能够轻松检测并更正分片数据库环境中全局分片映射 (GSM) 与本地分片映射 (LSM) 中的任何不一致性。The RecoveryManager class provides ADO.Net applications the ability to easily detect and correct any inconsistencies between the global shard map (GSM) and the local shard map (LSM) in a sharded database environment.

GSM 和 LSM 跟踪分片环境中每个数据库的映射。The GSM and LSM track the mapping of each database in a sharded environment. 有时,GSM 和 LSM 之间会发生中断。Occasionally, a break occurs between the GSM and the LSM. 在这种情况下,请使用 RecoveryManager 类来检测和修复中断问题。In that case, use the RecoveryManager class to detect and repair the break.

RecoveryManager 类是弹性数据库客户端库的一部分。The RecoveryManager class is part of the Elastic Database client library.


有关术语定义,请参阅 弹性数据库工具词汇表For term definitions, see Elastic Database tools glossary. 若要了解如何使用 ShardMapManager 来管理分片解决方案中的数据,请参阅分片映射管理To understand how the ShardMapManager is used to manage data in a sharded solution, see Shard map management.

为何使用恢复管理器Why use the recovery manager

在分片数据库环境中,每个数据库有一个租户,而每个服务器有多个数据库。In a sharded database environment, there is one tenant per database, and many databases per server. 环境中也可能有多个服务器。There can also be many servers in the environment. 每个数据库映射在分片映射中,以便将调用路由到正确的服务器和数据库。Each database is mapped in the shard map, so calls can be routed to the correct server and database. 根据分片键跟踪数据库,将为每个分片分配一系列键值。Databases are tracked according to a sharding key, and each shard is assigned a range of key values. 例如,分片键可能代表从“D”到“F”的客户名称。For example, a sharding key may represent the customer names from "D" to "F." 所有分片(也称为数据库)及其映射范围的映射都包含在全局分片映射 (GSM) 中。The mapping of all shards (aka databases) and their mapping ranges are contained in the global shard map (GSM). 每个数据库还包含分片上所包含范围的映射,称为本地分片映射 (LSM)。Each database also contains a map of the ranges contained on the shard that is known as the local shard map (LSM). 当应用连接到分片时,会在应用中缓存映射用于快速检索。When an app connects to a shard, the mapping is cached with the app for quick retrieval. LSM 用于验证缓存的数据。The LSM is used to validate cached data.

GSM 和 LSM 可能会因为以下原因而出现不同步的情况:The GSM and LSM may become out of sync for the following reasons:

  1. 删除其范围被认为是不再使用的分片,或重命名分片。The deletion of a shard whose range is believed to no longer be in use, or renaming of a shard. 删除分片导致 孤立的分片映射Deleting a shard results in an orphaned shard mapping. 类似地,重命名的数据库同样可能会造成孤立的分片映射。Similarly, a renamed database can cause an orphaned shard mapping. 根据更改的目的,可能需要删除分片或需要更新分片位置。Depending on the intent of the change, the shard may need to be removed or the shard location needs to be updated. 若要恢复已删除的数据库,请参阅还原已删除的数据库To recover a deleted database, see Restore a deleted database.
  2. 发生异地故障转移事件。A geo-failover event occurs. 要继续,必须有人更新服务器名称和应用程序中分片映射管理器的数据库名称,并更新分片映射中所有分片的分片映射详细信息。To continue, one must update the server name, and database name of shard map manager in the application and then update the shard mapping details for all shards in a shard map. 如果存在异地故障转移,此类恢复逻辑应该在故障转移工作流中自动化。If there is a geo-failover, such recovery logic should be automated within the failover workflow. 自动化修复操作能够实现顺畅地管理启用异地冗余的数据库,并避免人工操作。Automating recovery actions enables a frictionless manageability for geo-enabled databases and avoids manual human actions. 若要了解在出现数据中心服务中断时用于恢复数据库的选项,请参阅业务连续性灾难恢复To learn about options to recover a database if there is a data center outage, see Business Continuity and Disaster Recovery.
  3. 分片或 ShardMapManager 数据库还原到较早的时间点。Either a shard or the ShardMapManager database is restored to an earlier point-in time. 若要了解使用备份的时点恢复,请参阅使用备份恢复To learn about point in time recovery using backups, see Recovery using backups.

有关 Azure SQL 数据库弹性数据库工具、异地复制和还原的详细信息,请参阅以下内容:For more information about Azure SQL Database Elastic Database tools, geo-replication and Restore, see the following:

从 ShardMapManager 检索 RecoveryManagerRetrieving RecoveryManager from a ShardMapManager

第一个步骤是创建 RecoveryManager 实例。The first step is to create a RecoveryManager instance. GetRecoveryManager 方法返回当前 ShardMapManager 实例的恢复管理器。The GetRecoveryManager method returns the recovery manager for the current ShardMapManager instance. 若要解决分片映射中的任何不一致性,必须先检索特定分片映射的 RecoveryManager。To address any inconsistencies in the shard map, you must first retrieve the RecoveryManager for the particular shard map.

 ShardMapManager smm = ShardMapManagerFactory.GetSqlShardMapManager(smmConnectionString,  
          RecoveryManager rm = smm.GetRecoveryManager();

在此示例中,RecoveryManager 已从 ShardMapManager 进行了初始化。In this example, the RecoveryManager is initialized from the ShardMapManager. 包含 ShardMap 的 ShardMapManager 也已进行了初始化。The ShardMapManager containing a ShardMap is also already initialized.

由于此应用程序代码会自己处理分片映射,因此在工厂方法中使用的凭据(前面示例中的 smmConnectionString)应该是对连接字符串所引用的 GSM 数据库具有读写权限的凭据。Since this application code manipulates the shard map itself, the credentials used in the factory method (in the preceding example, smmConnectionString) should be credentials that have read-write permissions on the GSM database referenced by the connection string. 这些凭据通常与用于为数据相关的路由打开连接的凭据不同。These credentials are typically different from credentials used to open connections for data-dependent routing. 有关详细信息,请参阅在弹性数据库客户端中使用凭据For more information, see Using credentials in the elastic database client.

删除分片后从 ShardMap 中删除分片Removing a shard from the ShardMap after a shard is deleted

DetachShard 方法可从分片映射中分离给定的分片,并删除与该分片关联的映射。The DetachShard method detaches the given shard from the shard map and deletes mappings associated with the shard.

  • location 参数是分片位置,具体而言,包括要分离的分片的服务器名称和数据库名称。The location parameter is the shard location, specifically server name and database name, of the shard being detached.
  • shardMapName 参数是分片映射名称。The shardMapName parameter is the shard map name. 仅当多个分片映射由同一分片映射管理器管理时,才需要此参数。This is only required when multiple shard maps are managed by the same shard map manager. 可选。Optional.


仅当确定所更新映射的范围为空时,才使用此方法。Use this technique only if you are certain that the range for the updated mapping is empty. 上述方法不会检查数据中移动的范围,因此最好在代码中包含检查操作。The methods above do not check data for the range being moved, so it is best to include checks in your code.

此示例从分片映射删除分片。This example removes shards from the shard map.

rm.DetachShard(s.Location, customerMap);

在删除分片前,分片映射反映了 GSM 中的分片位置。The shard map reflects the shard location in the GSM before the deletion of the shard. 由于已删除分片,假设这是特意的,而且分片键范围已不再使用。Because the shard was deleted, it is assumed this was intentional, and the sharding key range is no longer in use. 如果不是这种情况,则可以执行时间点还原,If not, you can execute point-in time restore. 从较早的时间点还原分片。to recover the shard from an earlier point-in-time. (在这种情况下,请查看以下部分了解如何检测分片的不一致性。)若要恢复,请参阅时间点恢复(In that case, review the following section to detect shard inconsistencies.) To recover, see Point in time recovery.

由于假设数据库删除操作是有意而为的,最终的管理清理操作是删除分片映射管理器中分片的条目。Since it is assumed the database deletion was intentional, the final administrative cleanup action is to delete the entry to the shard in the shard map manager. 这可以防止应用程序无意中将信息写入到非预期的范围。This prevents the application from inadvertently writing information to a range that is not expected.

检测映射差异To detect mapping differences

DetectMappingDifferences 方法可选择并返回其中一个分片映射(本地或全局)做为真实源,并调解两个分片映射(GSM 和 LSM)上的映射。The DetectMappingDifferences method selects and returns one of the shard maps (either local or global) as the source of truth and reconciles mappings on both shard maps (GSM and LSM).

rm.DetectMappingDifferences(location, shardMapName);
  • location 指定服务器名称和数据库名称。The location specifies the server name and database name.
  • shardMapName 参数是分片映射名称。The shardMapName parameter is the shard map name. 仅当多个分片映射由同一分片映射管理器管理时,才需要此参数。This is only required if multiple shard maps are managed by the same shard map manager. 可选。Optional.

解决映射差异To resolve mapping differences

ResolveMappingDifferences 方法可选择其中一个分片映射(本地或全局)做为真实源,并调解两个分片映射(GSM 和 LSM)上的映射。The ResolveMappingDifferences method selects one of the shard maps (either local or global) as the source of truth and reconciles mappings on both shard maps (GSM and LSM).

ResolveMappingDifferences (RecoveryToken, MappingDifferenceResolution.KeepShardMapping);
  • RecoveryToken 参数枚举特定分片的 GSM 与 LSM 之间映射的差异。The RecoveryToken parameter enumerates the differences in the mappings between the GSM and the LSM for the specific shard.
  • MappingDifferenceResolution 枚举 指示用于解决分片映射之间差异的方法。The MappingDifferenceResolution enumeration is used to indicate the method for resolving the difference between the shard mappings.
  • MappingDifferenceResolution.KeepShardMapping ,因此应该使用分片中的映射。MappingDifferenceResolution.KeepShardMapping is recommended that when the LSM contains the accurate mapping and therefore the mapping in the shard should be used. 这通常是因为发生故障转移:分片现在驻留在新的服务器上。This is typically the case if there is a failover: the shard now resides on a new server. 由于必须先从 GSM 中删除分片(使用 RecoveryManager.DetachShard 方法),因此 GSM 上将不再存在映射。Since the shard must first be removed from the GSM (using the RecoveryManager.DetachShard method), a mapping no longer exists on the GSM. 因此,必须使用 LSM 重新建立分片映射。Therefore, the LSM must be used to re-establish the shard mapping.

还原分片后将分片附加到 ShardMapAttach a shard to the ShardMap after a shard is restored

AttachShard 方法 可将给定的分片附加到分片映射。The AttachShard method attaches the given shard to the shard map. 然后,它会检测分片映射的任何不一致性,并更新映射以匹配分片还原时间点的分片。It then detects any shard map inconsistencies and updates the mappings to match the shard at the point of the shard restoration. 假设对数据库也进行了重命名以反映原始数据库名称(在还原分片之前),因为时间点还原默认为追加时间戳的新数据库。It is assumed that the database is also renamed to reflect the original database name (before the shard was restored), since the point-in time restoration defaults to a new database appended with the timestamp.

rm.AttachShard(location, shardMapName)
  • location 参数是要附加的分片的服务器名称和数据库名称。The location parameter is the server name and database name, of the shard being attached.
  • shardMapName 参数是分片映射名称。The shardMapName parameter is the shard map name. 仅当多个分片映射由同一分片映射管理器管理时,才需要此参数。This is only required when multiple shard maps are managed by the same shard map manager. 可选。Optional.

此示例将分片添加到最近从较早时间点还原的分片映射。This example adds a shard to the shard map that has been recently restored from an earlier point-in time. 由于已还原分片(也就是 LSM 中的分片映射),因此该分片可能与 GSM 中的分片条目不一致。Since the shard (namely the mapping for the shard in the LSM) has been restored, it is potentially inconsistent with the shard entry in the GSM. 在此示例代码之外,分片已还原并重命名为数据库的原始名称。Outside of this example code, the shard was restored and renamed to the original name of the database. 由于它已还原,因此假设 LSM 中的映射为受信任的映射。Since it was restored, it is assumed the mapping in the LSM is the trusted mapping.

rm.AttachShard(s.Location, customerMap);
var gs = rm.DetectMappingDifferences(s.Location);
   foreach (RecoveryToken g in gs)
    rm.ResolveMappingDifferences(g, MappingDifferenceResolution.KeepShardMapping);

在分片异地故障转移(还原)之后更新分片位置Updating shard locations after a geo-failover (restore) of the shards

发生异地故障转移时,使辅助数据库可供写入访问,并成为新的主数据库。If there is a geo-failover, the secondary database is made write accessible and becomes the new primary database. 服务器的名称(根据具体的配置,有时还包括数据库的名称)可能与原始主副本不同。The name of the server, and potentially the database (depending on your configuration), may be different from the original primary. 因此,必须修复 GSM 和 LSM 中分片的映射条目。Therefore the mapping entries for the shard in the GSM and LSM must be fixed. 同样,如果数据库还原到不同的名称或位置,或还原到较早的时间点,则可能会造成分片映射中出现不一致性。Similarly, if the database is restored to a different name or location, or to an earlier point in time, this might cause inconsistencies in the shard maps. 分片映射管理器会将打开的连接分发给正确的数据库。The Shard Map Manager handles the distribution of open connections to the correct database. 这种分发基于分片映射中的数据以及作为应用程序请求目标的分片键值。Distribution is based on the data in the shard map and the value of the sharding key that is the target of the applications request. 异地故障转移之后,必须使用准确的服务器名称、数据库名称和已恢复数据库的分片映射更新这些信息。After a geo-failover, this information must be updated with the accurate server name, database name and shard mapping of the recovered database.

最佳实践Best practices

异地故障转移和恢复是通常由应用程序的云管理员特意使用 Azure SQL 数据库业务连续性功能管理的操作。Geo-failover and recovery are operations typically managed by a cloud administrator of the application intentionally utilizing one of Azure SQL databases business continuity features. 规划业务连续性需要实施相应的流程、过程和措施,以确保业务运营能够持续进行,而不发生中断。Business continuity planning requires processes, procedures, and measures to ensure that business operations can continue without interruption. 应该在此工作流中使用 RecoveryManager 类随附的方法,以确保根据采取的恢复操作使 GSM 和 LSM 保持最新状态。The methods available as part of the RecoveryManager class should be used within this work flow to ensure the GSM and LSM are kept up-to-date based on the recovery action taken. 可以执行五个基本步骤来确保在故障转移事件后,GSM 和 LSM 能反映准确的信息。There are five basic steps to properly ensuring the GSM and LSM reflect the accurate information after a failover event. 可将执行这些步骤的应用程序代码集成到现有的工具和工作流中。The application code to execute these steps can be integrated into existing tools and workflow.

  1. 从 ShardMapManager 检索 RecoveryManager。Retrieve the RecoveryManager from the ShardMapManager.
  2. 从分片映射中分离旧分片。Detach the old shard from the shard map.
  3. 将新分片附加到分片映射,包括新的分片位置。Attach the new shard to the shard map, including the new shard location.
  4. 检测 GSM 和 LSM 之间映射的不一致性。Detect inconsistencies in the mapping between the GSM and LSM.
  5. 通过信任 LSM,解决 GSM 和 LSM 之间的差异。Resolve differences between the GSM and the LSM, trusting the LSM.

此示例执行以下步骤:This example performs the following steps:

  1. 从反映故障转移事件之前分片位置的分片映射中删除分片。Removes shards from the Shard Map that reflect shard locations before the failover event.

  2. 将分片附加到反映新分片位置的分片映射(参数“Configuration.SecondaryServer”是新的服务器名称,但是相同的数据库名称)。Attaches shards to the Shard Map reflecting the new shard locations (the parameter "Configuration.SecondaryServer" is the new server name but the same database name).

  3. 通过检测每个分片的 GSM 与 LSM 之间的映射差异来检索恢复令牌。Retrieves the recovery tokens by detecting mapping differences between the GSM and the LSM for each shard.

  4. 通过信任来自每个分片 LSM 的映射解决不一致性。Resolves the inconsistencies by trusting the mapping from the LSM of each shard.

    var shards = smm.GetShards();
    foreach (shard s in shards)
      if (s.Location.Server == Configuration.PrimaryServer)
           ShardLocation slNew = new ShardLocation(Configuration.SecondaryServer, s.Location.Database);
           var gs = rm.DetectMappingDifferences(slNew);
           foreach (RecoveryToken g in gs)
                rm.ResolveMappingDifferences(g, MappingDifferenceResolution.KeepShardMapping);

其他资源Additional resources

尚未使用弹性数据库工具?Not using elastic database tools yet? 请查看入门指南Check out our Getting Started Guide.