在扩大云数据库之间移动数据Moving data between scaled-out cloud databases

适用于: Azure SQL 数据库

如果是软件即服务开发人员,并且应用突然遇到巨大需求,那么需要适应该需求增长。If you are a Software as a Service developer, and suddenly your app undergoes tremendous demand, you need to accommodate the growth. 因此,你添加了更多数据库(分片)。So you add more databases (shards). 如何在不破坏数据完整性的情况下将数据重新分配到新数据库?How do you redistribute the data to the new databases without disrupting the data integrity? 使用拆分/合并工具将数据从受约束的数据库移到新数据库。Use the split-merge tool to move data from constrained databases to the new databases.

将拆分/合并工具作为 Azure web 服务运行。The split-merge tool runs as an Azure web service. 管理员或开发人员使用该工具在不同数据库 (分片)之间移动 shardlet(一个分片中的数据)。An administrator or developer uses the tool to move shardlets (data from a shard) between different databases (shards). 该工具使用分片映射管理来维护服务元数据数据库,并确保一致的映射。The tool uses shard map management to maintain the service metadata database, and ensure consistent mappings.

概述

下载Download

Microsoft.Azure.SqlDatabase.ElasticScale.Service.SplitMergeMicrosoft.Azure.SqlDatabase.ElasticScale.Service.SplitMerge

文档Documentation

  1. 弹性数据库拆分/合并工具教程Elastic database split-merge tool tutorial
  2. 拆分/合并安全配置Split-merge security configuration
  3. 拆分/合并安全注意事项Split-merge security considerations
  4. 分片映射管理Shard map management
  5. 迁移要扩大的现有数据库Migrate existing databases to scale-out
  6. 弹性数据库工具Elastic database tools
  7. 弹性数据库工具术语表Elastic database tools glossary

为什么使用拆分/合并工具Why use the split-merge tool

  • 灵活性Flexibility

    应用程序需要灵活延伸,超出 Azure SQL 数据库中单一数据库的限制。Applications need to stretch flexibly beyond the limits of a single database in Azure SQL Database. 根据需要使用该工具将数据移到新的数据库,同时保留完整性。Use the tool to move data as needed to new databases while retaining integrity.

  • 拆分以实现增长Split to grow

    为了提高处理爆炸性增长的总体容量,需要通过对数据进行分片并将数据分发给越来越多的数据库来提供额外容量,直到满足容量需求。To increase overall capacity to handle explosive growth, create additional capacity by sharding the data and by distributing it across incrementally more databases until capacity needs are fulfilled. 这是“拆分”功能的一个典型示例。This is a prime example of the split feature.

  • 合并以实现缩减Merge to shrink

    由于业务的季节性,需要缩减容量。Capacity needs shrink due to the seasonal nature of a business. 当业务减少时,使用该工具可减少到更少的缩放单元。The tool lets you scale down to fewer scale units when business slows. 弹性缩放拆分/合并服务的“合并”功能可以满足此要求。The 'merge' feature in the Elastic Scale Split-Merge Service covers this requirement.

  • 通过移动 Shardlet 管理热点Manage hotspots by moving shardlets

    在一个数据库中具有多个租户情况下,将 shardlet 分配到分片可能导致某些分片上出现容量瓶颈。With multiple tenants per database, the allocation of shardlets to shards can lead to capacity bottlenecks on some shards. 这就要求重新分配 shardlet 或将工作中的 shardlet 移动到新的或容量使用较少的分片上。This requires re-allocating shardlets or moving busy shardlets to new or less utilized shards.

概念和主要功能Concepts & key features

  • 客户托管服务Customer-hosted services

    拆分/合并将作为客户托管的服务交付。The split-merge is delivered as a customer-hosted service. 必须在 Azure 订阅中部署和托管该服务。You must deploy and host the service in your Azure subscription. 从 NuGet 下载的程序包将包含一个要使用特定部署信息完成的配置模板。The package you download from NuGet contains a configuration template to complete with the information for your specific deployment. 有关详细信息,请参阅拆分/合并教程See the split-merge tutorial for details. 由于服务在 Azure 订阅中运行,因此可以控制和配置该服务的大多数安全设置。Since the service runs in your Azure subscription, you can control and configure most security aspects of the service. 默认模板包括用于配置以下内容的选项:TLS、基于证书的客户端身份验证、存储凭据的加密、DoS 防护和 IP 限制。The default template includes the options to configure TLS, certificate-based client authentication, encryption for stored credentials, DoS guarding and IP restrictions. 可以在以下 拆分/合并安全配置文档中找到有关安全方面的详细信息。You can find more information on the security aspects in the following document split-merge security configuration.

    默认部署的服务可与一个辅助角色和一个 Web 角色同时运行。The default deployed service runs with one worker and one web role. 在 Azure 云服务中,每个角色都使用 A1 VM 大小。Each uses the A1 VM size in Azure Cloud Services. 虽然你无法在部署程序包时修改这些设置,但是可以在运行的云服务中成功进行部署之后更改它们(通过 Azure 门户)。While you cannot modify these settings when deploying the package, you could change them after a successful deployment in the running cloud service, (through the Azure portal). 请注意,出于技术方面的原因,不得为多个实例配置辅助角色。Note that the worker role must not be configured for more than a single instance for technical reasons.

  • 分片映射集成Shard map integration

    拆分/合并服务可与应用程序的分片映射进行交互。The split-merge service interacts with the shard map of the application. 使用拆分/合并服务拆分或合并范围或者在分片之间移动 shardlet 时,该服务会使分片映射自动保持最新。When using the split-merge service to split or merge ranges or to move shardlets between shards, the service automatically keeps the shard map up-to-date. 为实现此目的,该服务将连接到应用程序的分片映射管理器数据库并将这些范围和映射保留为拆分/合并/移动请求进度。To do so, the service connects to the shard map manager database of the application and maintains ranges and mappings as split/merge/move requests progress. 这可确保在进行拆分/合并操作时,分片映射始终显示最新视图。This ensures that the shard map always presents an up-to-date view when split-merge operations are going on. 通过将一批 shardlet 从源分片移动到目标分片来实现拆分、合并和 shardlet 移动操作。Split, merge and shardlet movement operations are implemented by moving a batch of shardlets from the source shard to the target shard. 在 shardlet 移动操作过程中,属于当前批的 shardlet 在分片映射中标记为脱机,并且不可用于使用 OpenConnectionForKey API 进行依赖于数据的路由连接。During the shardlet movement operation the shardlets subject to the current batch are marked as offline in the shard map and are unavailable for data-dependent routing connections using the OpenConnectionForKey API.

  • 一致的 shardlet 连接Consistent shardlet connections

    为了避免不一致,当一批新的 shardlet 开始进行数据移动时,将断开到存储 shardlet 的分片的所有分片映射提供的数据依赖型路由连接;当数据移动正在进行时,将阻止从分片映射 API 到这些 shardlet 的后续连接。When data movement starts for a new batch of shardlets, any shard-map provided data-dependent routing connections to the shard storing the shardlet are killed and subsequent connections from the shard map APIs to the shardlets are blocked while the data movement is in progress in order to avoid inconsistencies. 到同一分片上其他 shardlet 的连接也会断开,但是重试时会立即再次成功连接。Connections to other shardlets on the same shard will also get killed, but will succeed again immediately on retry. 移动此批后,目标分片的 shardlet 会再次标记为联机,并且源数据将从源分片中删除。Once the batch is moved, the shardlets are marked online again for the target shard and the source data is removed from the source shard. 该服务针对每一批执行以上步骤,直到所有 shardlet 都已移动。The service goes through these steps for every batch until all shardlets have been moved. 在完成拆分/合并/移动操作过程中,这会导致几个连接中断操作。This will lead to several connection kill operations during the course of the complete split/merge/move operation.

  • 管理 shardlet 可用性Managing shardlet availability

    如上所述对当前批的 shardlet 的连接中断进行限制,从而限制一次到一批 shardlet 的不可用范围。Limiting the connection killing to the current batch of shardlets as discussed above restricts the scope of unavailability to one batch of shardlets at a time. 与在拆分或合并操作过程中,整个分片针对其所有 shardlet 保持脱机的方法相比,此方法是首选方法。This is preferred over an approach where the complete shard would remain offline for all its shardlets during the course of a split or merge operation. 定义为一次要移动的不同 shardlet 数的批大小是一个配置参数。The size of a batch, defined as the number of distinct shardlets to move at a time, is a configuration parameter. 可以根据应用程序的可用性和性能需求为每个拆分与合并操作定义该配置参数。It can be defined for each split and merge operation depending on the application's availability and performance needs. 请注意,将在分片映射中锁定的范围可能会大于指定的批大小。Note that the range that is being locked in the shard map may be larger than the batch size specified. 这是因为服务会选取范围大小,以使数据中分片键值的实际数目大约等于批大小。This is because the service picks the range size such that the actual number of sharding key values in the data approximately matches the batch size. 这对于稀少的分片键而言尤为重要。This is important to remember in particular for sparsely populated sharding keys.

  • 元数据存储Metadata storage

    拆分/合并服务使用数据库来维护其状态,并在请求处理期间保存日志。The split-merge service uses a database to maintain its status and to keep logs during request processing. 用户在其订阅中创建此数据库并在服务部署的配置文件中为其提供连接字符串。The user creates this database in their subscription and provides the connection string for it in the configuration file for the service deployment. 来自用户所在组织的管理员也可以连接到此数据库,以查看请求进度和调查有关潜在故障的详细信息。Administrators from the user's organization can also connect to this database to review request progress and to investigate detailed information regarding potential failures.

  • 分片感知Sharding-awareness

    可从以下方面区分拆分和合并服务:(1)分片表、(2)引用表和(3)普通表。The split-merge service differentiates between (1) sharded tables, (2) reference tables, and (3) normal tables. 拆分/合并/移动操作的语义取决于所用表的类型,定义如下:The semantics of a split/merge/move operation depend on the type of the table used and are defined as follows:

    • 分片表Sharded tables

      拆分、合并与移动操作将 shardlet 从源分片移动到目标分片。Split, merge, and move operations move shardlets from source to target shard. 成功完成整个请求后,这些 shardlet 不会再显示在源分片上。After successful completion of the overall request, those shardlets are no longer present on the source. 请注意,目标表需要存在于目标分片上,并且在处理操作之前,目标表中不能包含目标范围中的数据。Note that the target tables need to exist on the target shard and must not contain data in the target range prior to processing of the operation.

    • 引用表Reference tables

      对于引用表,拆分、合并和移动操作会将数据从源分片复制到目标分片。For reference tables, the split, merge and move operations copy the data from the source to the target shard. 但是,请注意,如果目标分片上的给定表中已存在任何行,则该表在目标分片上不会发生任何更改。Note, however, that no changes occur on the target shard for a given table if any row is already present in this table on the target. 要使任何引用表复制操作得到处理,该表必须为空。The table has to be empty for any reference table copy operation to get processed.

    • 其他表Other tables

      其他表可存在于拆分与合并操作的源或目标上。Other tables can be present on either the source or the target of a split and merge operation. 对于任何数据移动或复制操作,拆分/合并服务都会忽略这些表。The split-merge service disregards these tables for any data movement or copy operations. 但是,请注意,在出现约束的情况下,它们可能会干扰这些操作。Note, however, that they can interfere with these operations in case of constraints.

      有关引用表和分片表对比的信息可由分片映射上的 SchemaInfo API 提供。The information on reference vs. sharded tables is provided by the SchemaInfo APIs on the shard map. 以下示例说明了如何在给定分片映射管理器对象上使用这些 API:The following example illustrates the use of these APIs on a given shard map manager object:

      // Create the schema annotations
      SchemaInfo schemaInfo = new SchemaInfo();
      
      // reference tables
      schemaInfo.Add(new ReferenceTableInfo("dbo", "region"));
      schemaInfo.Add(new ReferenceTableInfo("dbo", "nation"));
      
      // sharded tables
      schemaInfo.Add(new ShardedTableInfo("dbo", "customer", "C_CUSTKEY"));
      schemaInfo.Add(new ShardedTableInfo("dbo", "orders", "O_CUSTKEY"));
      
      // publish
      smm.GetSchemaInfoCollection().Add(Configuration.ShardMapName, schemaInfo);
      

      将表“region”和表“nation”定义为引用表,并使用拆分/合并/移动操作复制它们,The tables 'region' and 'nation' are defined as reference tables and will be copied with split/merge/move operations. 而将“customer”和“orders”定义为分片表。'customer' and 'orders' in turn are defined as sharded tables. C_CUSTKEYO_CUSTKEY 用作分片键。C_CUSTKEY and O_CUSTKEY serve as the sharding key.

  • 引用完整性Referential integrity

    拆分/合并服务会分析各表之间的依赖关系,并使用外键-主键关系来暂存用于移动引用表和 shardlet 的操作。The split-merge service analyzes dependencies between tables and uses foreign key-primary key relationships to stage the operations for moving reference tables and shardlets. 通常,首先按依赖项顺序复制引用表,并按每一批中 shardlet 的依赖项顺序复制 shardlet。In general, reference tables are copied first in dependency order, then shardlets are copied in order of their dependencies within each batch. 这是必要的,以便在新的数据到达时遵循目标分片上的外键-主键约束。This is necessary so that FK-PK constraints on the target shard are honored as the new data arrives.

  • 分片映射一致性和最终完成Shard map consistency and eventual completion

    出现故障时,拆分/合并服务会在发生任何中断后恢复操作,旨在完成任何正在进行的请求。In the presence of failures, the split-merge service resumes operations after any outage and aims to complete any in progress requests. 但是,也可能存在未涉及到的情况,例如,目标分片丢失或泄露,无法修复。However, there may be unrecoverable situations, e.g., when the target shard is lost or compromised beyond repair. 在这些情况下,一些本应移动的 shardlet 可能会继续驻留在源分片上。Under those circumstances, some shardlets that were supposed to be moved may continue to reside on the source shard. 该服务可确保仅在已将必需的数据成功地复制到目标分片后才更新 shardlet 映射。The service ensures that shardlet mappings are only updated after the necessary data has been successfully copied to the target. 仅当已将 shardlet 的所有数据都成功地复制到目标分片并已成功地更新对应的映射后,才在源分片上删除 shardlet。Shardlets are only deleted on the source once all their data has been copied to the target and the corresponding mappings have been updated successfully. 当目标分片上的范围已处于联机状态时,删除操作会在后台执行。The deletion operation happens in the background while the range is already online on the target shard. 拆分/合并服务始终确保存储在分片映射中的映射的正确性。The split-merge service always ensures correctness of the mappings stored in the shard map.

拆分/合并用户界面The split-merge user interface

拆分/合并 Service Pack 包含辅助角色和 Web 角色。The split-merge service package includes a worker role and a web role. Web 角色用于以交互方式提交拆分/合并请求。The web role is used to submit split-merge requests in an interactive way. 用户界面的主要组件如下:The main components of the user interface are as follows:

  • 操作类型Operation type

    操作类型是一个单选按钮,用于控制针对此请求由服务执行的操作类型。The operation type is a radio button that controls the kind of operation performed by the service for this request. 可以在拆分、合并和移动方案之间选择。You can choose between the split, merge and move scenarios. 还可以取消以前提交的操作。You can also cancel a previously submitted operation. 可以使用拆分、合并和移动请求来设置分片映射的范围。You can use split, merge and move requests for range shard maps. 列表分片映射仅支持移动操作。List shard maps only support move operations.

  • 分片映射Shard map

    请求参数的下一部分包含有关分片映射和托管分片映射的数据库的信息。The next section of request parameters covers information about the shard map and the database hosting your shard map. 具体而言,需要提供托管分片映射的服务器和数据库的名称、用于连接到分片映射数据库的凭据以及分片映射的名称。In particular, you need to provide the name of the server and database hosting the shardmap, credentials to connect to the shard map database, and finally the name of the shard map. 当前,该操作仅接受一个凭据集。Currently, the operation only accepts a single set of credentials. 这些凭据需要具有足够的权限,才能对分片映射和分片上的用户数据执行更改。These credentials need to have sufficient permissions to perform changes to the shard map as well as to the user data on the shards.

  • 源范围(拆分与合并)Source range (split and merge)

    拆分与合并操作将使用范围的低键和高键来处理该范围。A split and merge operation processes a range using its low and high key. 若要使用无边界的高键值指定操作,请选中“高键为最大值”复选框,并将高键字段留空。To specify an operation with an unbounded high key value, check the "High key is max" check box and leave the high key field empty. 指定的范围键值不需要与分片映射中的映射及其边界精确匹配。The range key values that you specify do not need to precisely match a mapping and its boundaries in your shard map. 如果未指定任何范围边界,服务会自动推断最接近的范围。If you do not specify any range boundaries at all the service will infer the closest range for you automatically. 可以使用 GetMappings.ps1 PowerShell 脚本检索给定分片映射中的当前映射。You can use the GetMappings.ps1 PowerShell script to retrieve the current mappings in a given shard map.

  • 拆分源行为(拆分)Split source behavior (split)

    对于拆分操作,请定义要拆分源范围的点。For split operations, define the point to split the source range. 可通过提供希望进行拆分的分片键来完成此操作。You do this by providing the sharding key where you want the split to occur. 使用单选按钮指定是否要移动范围的下半部分(不包括拆分键),或是否要移动上半部分(包括拆分键)。Use the radio button specify whether you want the lower part of the range (excluding the split key) to move, or whether you want the upper part to move (including the split key).

  • 源 shardlet(移动)Source shardlet (move)

    移动操作不同于拆分操作或合并操作,因为它们不需要范围来描述源。Move operations are different from split or merge operations as they do not require a range to describe the source. 用于移动的源仅由你计划移动的分片键值标识。A source for move is simply identified by the sharding key value that you plan to move.

  • 目标分片(拆分)Target shard (split)

    提供了有关拆分操作的源的信息后,需要通过提供目标的服务器和数据库名称来定义数据复制的目标位置。Once you have provided the information on the source of your split operation, you need to define where you want the data to be copied to by providing the server and database name for the target.

  • 目标范围(合并)Target range (merge)

    合并操作会将 shardlet 移动到现有分片上。Merge operations move shardlets to an existing shard. 可以通过提供希望合并的现有范围的范围边界来标识现有分片。You identify the existing shard by providing the range boundaries of the existing range that you want to merge with.

  • 批大小Batch size

    批大小控制在数据移动过程中将处于脱机状态的 shardlet 数。The batch size controls the number of shardlets that will go offline at a time during the data movement. 这是一个整数值,对 shardlet 的长期停机时间敏感时,可以使用其中的较小值。This is an integer value where you can use smaller values when you are sensitive to long periods of downtime for shardlets. 较大值会增加给定 shardlet 处于脱机状态的时间,但是可能会提高性能。Larger values will increase the time that a given shardlet is offline but may improve performance.

  • 操作 ID(取消)Operation ID (cancel)

    如果具有不再需要的进行中的操作,则可以通过在此字段中提供其操作 ID 来取消该操作。If you have an ongoing operation that is no longer needed, you can cancel the operation by providing its operation ID in this field. 可以从请求状态表(请参阅章节 8.1)或提交请求的 Web 浏览器的输出中检索该操作 ID。You can retrieve the operation ID from the request status table (see Section 8.1) or from the output in the web browser where you submitted the request.

要求和限制Requirements and limitations

拆分/合并服务的当前实现遵循以下要求和限制:The current implementation of the split-merge service is subject to the following requirements and limitations:

  • 必须存在分片并且这些分片已在分片映射中注册,才可以对这些分片执行拆分/合并操作。The shards need to exist and be registered in the shard map before a split-merge operation on these shards can be performed.
  • 该服务未将表或任何其他数据库对象的自动创建作为其操作的一部分。The service does not create tables or any other database objects automatically as part of its operations. 这意味着在任何拆分/合并/移动操作之前,所有分片表和引用表的架构都需要存在于目标分片上。This means that the schema for all sharded tables and reference tables needs to exist on the target shard prior to any split/merge/move operation. 在要通过拆分/合并/移动操作添加新的 shardlet 的范围中,尤其要求分片表为空。Sharded tables in particular are required to be empty in the range where new shardlets are to be added by a split/merge/move operation. 否则,该操作无法通过目标分片上的初始一致性检查。Otherwise, the operation will fail the initial consistency check on the target shard. 此外,请注意,仅当引用表为空时才复制引用数据,而且对于引用表上的其他并发写入操作没有一致性保证。Also note that reference data is only copied if the reference table is empty and that there are no consistency guarantees with regard to other concurrent write operations on the reference tables. 我们建议:在运行拆分/合并操作的同时不要使其他写入操作对引用表做出更改。We recommend this: when running split/merge operations, no other write operations make changes to the reference tables.
  • 该服务依赖于行标识(由包含分片键的唯一索引或键构建)来提高较大 shardlet 的性能和可靠性。The service relies on row identity established by a unique index or key that includes the sharding key to improve performance and reliability for large shardlets. 这使该服务能够移动粒度比分片键值更加精细的数据。This allows the service to move data at an even finer granularity than just the sharding key value. 这有助于减少操作过程中必需的日志空间和锁定的最大数量。This helps to reduce the maximum amount of log space and locks that are required during the operation. 如果希望通过拆分/合并/移动请求使用给定表,请考虑在该表上创建一个包括分片键的唯一索引或主键。Consider creating a unique index or a primary key including the sharding key on a given table if you want to use that table with split/merge/move requests. 出于性能原因,分片键应为键或索引中的起始列。For performance reasons, the sharding key should be the leading column in the key or the index.
  • 在请求处理过程中,一些 shardlet 数据可能会同时存在于源分片和目标分片上。During the course of request processing, some shardlet data may be present both on the source and the target shard. 为了防止在 shardlet 移动过程中出现故障,这是必需的。This is necessary to protect against failures during the shardlet movement. 拆分/合并服务与分片映射功能的集成可以确保在分片映射上使用“OpenConnectionForKey”方法通过依赖于数据的路由 API 建立的连接不会显示任何不一致的中间状态。The integration of split-merge with the shard map ensures that connections through the data-dependent routing APIs using the OpenConnectionForKey method on the shard map do not see any inconsistent intermediate states. 但是,在不使用 OpenConnectionForKey 方法连接到源分片或目标分片时,如果正在执行拆分/合并/移动请求,则不一致的中间状态可能可见。However, when connecting to the source or the target shards without using the OpenConnectionForKey method, inconsistent intermediate states might be visible when split/merge/move requests are going on. 这些连接可能会显示部分或重复的结果,具体取决于时间设置或进行基础连接的分片。These connections may show partial or duplicate results depending on the timing or the shard underlying the connection. 此限制当前包括由弹性缩放多分片查询建立的连接。This limitation currently includes the connections made by Elastic Scale Multi-Shard-Queries.
  • 不能在不同的角色之间共享用于拆分/合并服务的元数据数据库。The metadata database for the split-merge service must not be shared between different roles. 例如,在过渡环境中运行的拆分/合并服务的角色需要指向其他元数据数据库而不是生产角色。For example, a role of the split-merge service running in staging needs to point to a different metadata database than the production role.

计费Billing

在 Azure 订阅中拆分/合并服务作为云服务运行。The split-merge service runs as a cloud service in your Azure subscription. 因此会对你的服务实例收取云服务费用。Therefore charges for cloud services apply to your instance of the service. 除非频繁地执行拆分/合并/移动操作,否则建议删除拆分/合并云服务。Unless you frequently perform split/merge/move operations, we recommend you delete your split-merge cloud service. 这可以节省用于运行中的或已部署的云服务实例的成本。That saves costs for running or deployed cloud service instances. 只要需要执行拆分或合并操作,便可以重新部署和启用已准备好的可运行配置。You can re-deploy and start your readily runnable configuration whenever you need to perform split or merge operations.

监视Monitoring

状态表Status tables

拆分/合并服务在元数据存储数据库中提供用于监视已完成和正在进行的请求的 RequestStatus 表。The split-merge Service provides the RequestStatus table in the metadata store database for monitoring of completed and ongoing requests. 该表为已提交到拆分/合并服务的此实例的每个拆分/合并请求列出一行。The table lists a row for each split-merge request that has been submitted to this instance of the split-merge service. 它为每个请求提供以下信息:It gives the following information for each request:

  • TimestampTimestamp

    发起请求时的时间和日期。The time and date when the request was started.

  • OperationIdOperationId

    唯一标识请求的 GUID。A GUID that uniquely identifies the request. 此请求也可用于取消仍在进行的操作。This request can also be used to cancel the operation while it is still ongoing.

  • 状态Status

    该请求的当前状态。The current state of the request. 对于正在进行的请求,它还会列出请求所在的当前阶段。For ongoing requests, it also lists the current phase in which the request is.

  • CancelRequestCancelRequest

    用于指示是否已取消请求的标志。A flag that indicates whether the request has been canceled.

  • 进度Progress

    该操作完成过程的百分比估计。A percentage estimate of completion for the operation. 值 50 指示该操作大约完成 50%。A value of 50 indicates that the operation is approximately 50% complete.

  • 详细信息Details

    用于提供更详细的进度报告的 XML 值。An XML value that provides a more detailed progress report. 当将多组行从源复制到目标时,进度报告会定期更新。The progress report is periodically updated as sets of rows are copied from source to target. 此列还包括有关故障的详细信息,以防故障或异常。In case of failures or exceptions, this column also includes more detailed information about the failure.

Azure 诊断Azure Diagnostics

拆分/合并服务使用基于 Azure SDK 2.5 的 Azure Diagnostics 进行监视与诊断。The split-merge service uses Azure Diagnostics based on Azure SDK 2.5 for monitoring and diagnostics. 可以按照此处的说明控制诊断配置:在 Azure 云服务和虚拟机中启用诊断You control the diagnostics configuration as explained here: Enabling Diagnostics in Azure Cloud Services and Virtual Machines. 下载包包含两个诊断配置 - 一个用于 Web 角色,另一个用于辅助角色。The download package includes two diagnostics configurations - one for the web role and one for the worker role. 它包括用于记录性能计数器、IIS 日志、Windows 事件日志和拆分/合并应用程序事件日志的定义。It includes the definitions to log Performance Counters, IIS logs, Windows Event Logs, and split-merge application event logs.

部署诊断Deploy Diagnostics

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

重要

仍然支持 PowerShell Azure 资源管理器模块,但是所有未来的开发都是针对 Az.Sql 模块。The PowerShell Azure Resource Manager module is still supported, but all future development is for the Az.Sql module. 若要了解这些 cmdlet,请参阅 AzureRM.SqlFor these cmdlets, see AzureRM.Sql. Az 模块和 AzureRm 模块中的命令参数大体上是相同的。The arguments for the commands in the Az module and in the AzureRm modules are substantially identical.

针对 NuGet 包所提供的 Web 和辅助角色,若要使用诊断配置启用监视和诊断,请使用 Azure PowerShell 运行以下命令:To enable monitoring and diagnostics using the diagnostic configuration for the web and worker roles provided by the NuGet package, run the following commands using Azure PowerShell:

$storageName = "<azureStorageAccount>"
$key = "<azureStorageAccountKey"
$storageContext = New-AzStorageContext -StorageAccountName $storageName -StorageAccountKey $key
$configPath = "<filePath>\SplitMergeWebContent.diagnostics.xml"
$serviceName = "<cloudServiceName>"

Set-AzureServiceDiagnosticsExtension -StorageContext $storageContext `
    -DiagnosticsConfigurationPath $configPath -ServiceName $serviceName `
    -Slot Production -Role "SplitMergeWeb"

Set-AzureServiceDiagnosticsExtension -StorageContext $storageContext `
    -DiagnosticsConfigurationPath $configPath -ServiceName $serviceName `
    -Slot Production -Role "SplitMergeWorker"

可以在此处找到有关如何配置和部署诊断设置的详细信息:在 Azure 云服务和虚拟机中启用诊断You can find more information on how to configure and deploy diagnostics settings here: Enabling Diagnostics in Azure Cloud Services and Virtual Machines.

检索诊断Retrieve diagnostics

可以从服务器资源管理器树的 Azure 部分中的 Visual Studio 服务器资源管理器轻松访问诊断。You can easily access your diagnostics from the Visual Studio Server Explorer in the Azure part of the Server Explorer tree. 打开 Visual Studio 实例,并在菜单栏中,依次单击“视图”和“服务器资源管理器”。Open a Visual Studio instance, and in the menu bar click View, and Server Explorer. 单击 Azure 图标连接到 Azure 订阅。Click the Azure icon to connect to your Azure subscription. 然后,导航到“Azure”->“存储”->“<your storage account>”->“表”->“WADLogsTable”。Then navigate to Azure -> Storage -> <your storage account> -> Tables -> WADLogsTable. 有关详细信息,请参阅服务器资源管理器For more information, see Server Explorer.

WADLogsTable

上图中突出显示的 WADLogsTable 包含来自拆分/合并服务的应用程序日志的详细事件。The WADLogsTable highlighted in the figure above contains the detailed events from the split-merge service's application log. 请注意,已下载包提供的默认配置面向生产部署。Note that the default configuration of the downloaded package is geared towards a production deployment. 因此,从服务实例中提取日志和计数器的时间间隔较大(5 分钟)。Therefore the interval at which logs and counters are pulled from the service instances is large (5 minutes). 对于测试和开发,可以通过按需调整 Web 或辅助角色的诊断设置来减少该时间间隔。For test and development, lower the interval by adjusting the diagnostics settings of the web or the worker role to your needs. 右键单击 Visual Studio 服务器资源管理器中的角色(如上所示),然后在对话框中调整诊断配置设置的传输时间段:Right-click on the role in the Visual Studio Server Explorer (see above) and then adjust the Transfer Period in the dialog for the Diagnostics configuration settings:

配置

性能Performance

通常,更高、更可执行的服务层级应具有更好的性能。In general, better performance is to be expected from higher, more performant service tiers. 为更高服务层级分配更高的 IO、CPU 和内存有利于拆分/合并服务在使用的批量复制和删除操作。Higher IO, CPU and memory allocations for the higher service tiers benefit the bulk copy and delete operations that the split-merge service uses. 因此,在定义的有限时间段内仅为这些数据库提高服务层级。For that reason, increase the service tier just for those databases for a defined, limited period of time.

该服务也会将验证查询作为其常规操作的一部分来执行。The service also performs validation queries as part of its normal operations. 这些验证查询还会检查目标范围中数据的异常存在,确保任何拆分/合并/移动操作都从一致状态开始进行。These validation queries check for unexpected presence of data in the target range and ensure that any split/merge/move operation starts from a consistent state. 这些查询在操作范围定义的分片键范围和作为请求定义的一部分而提供的批大小上都有效。These queries all work over sharding key ranges defined by the scope of the operation and the batch size provided as part of the request definition. 当使用分片键作为起始列的索引存在时,这些查询表现最好。These queries perform best when an index is present that has the sharding key as the leading column.

此外,使用分片键作为起始列的唯一性将使服务能够使用一种优化的方式来限制日志空间和内存方面的资源使用。In addition, a uniqueness property with the sharding key as the leading column will allow the service to use an optimized approach that limits resource consumption in terms of log space and memory. 若要移动较大数据大小(通常为 1GB 以上),此唯一性是必需的。This uniqueness property is required to move large data sizes (typically above 1GB).

如何升级How to upgrade

  1. 请按照部署拆分/合并服务中的步骤进行操作。Follow the steps in Deploy a split-merge service.
  2. 更改拆分/合并部署的云服务配置文件,以反映新的配置参数。Change your cloud service configuration file for your split-merge deployment to reflect the new configuration parameters. 新的必需参数是用于加密的证书的相关信息。A new required parameter is the information about the certificate used for encryption. 执行此操作的简单方法是将下载的新配置模板文件与现有配置进行比较。An easy way to do this is to compare the new configuration template file from the download against your existing configuration. 请确保添加 Web 和辅助角色的“DataEncryptionPrimaryCertificateThumbprint”与“DataEncryptionPrimary”设置。Make sure you add the settings for "DataEncryptionPrimaryCertificateThumbprint" and "DataEncryptionPrimary" for both the web and the worker role.
  3. 将更新部署到 Azure 之前,请确保当前运行的所有拆分/合并操作都已完成。Before deploying the update to Azure, ensure that all currently running split-merge operations have finished. 做法很简单,可以针对进行中的请求,查询拆分/合并元数据数据库中的 RequestStatus 和 PendingWorkflows 表。You can easily do this by querying the RequestStatus and PendingWorkflows tables in the split-merge metadata database for ongoing requests.
  4. 使用新程序包和更新的服务配置文件,在 Azure 订阅中更新拆分/合并的现有云服务部署。Update your existing cloud service deployment for split-merge in your Azure subscription with the new package and your updated service configuration file.

无需预配新的元数据数据库,即可升级拆分/合并。You do not need to provision a new metadata database for split-merge to upgrade. 新版本会自动将现有的元数据数据库升级到新版本。The new version will automatically upgrade your existing metadata database to the new version.

最佳做法和疑难解答Best practices & troubleshooting

  • 定义一个测试租户,并使用该测试租户在几个分片上对最重要的拆分/合并/移动操作进行实验。Define a test tenant and exercise your most important split/merge/move operations with the test tenant across several shards. 确保已在分片映射中正确定义所有元数据,并且这些操作不违反约束或外键。Ensure that all metadata is defined correctly in your shard map and that the operations do not violate constraints or foreign keys.
  • 请使测试租户的数据大小始终大于最大租户的最大数据大小,以确保不会遇到与数据大小有关的问题。Keep the test tenant data size above the maximum data size of your largest tenant to ensure you are not encountering data size related issues. 这会帮助你评估移动单个租户所需时间的上限。This helps you assess an upper bound on the time it takes to move a single tenant around.
  • 确保架构允许删除操作。Make sure that your schema allows deletions. 拆分/合并服务要求在将数据成功地复制到目标分片后,能够从源分片中删除数据。The split-merge service requires the ability to remove data from the source shard once the data has been successfully copied to the target. 例如, 删除触发器 可以阻止该服务删除源分片上的数据,并且可能导致操作失败。For example, delete triggers can prevent the service from deleting the data on the source and may cause operations to fail.
  • 分片键在主键或唯一索引定义中应该是起始列。The sharding key should be the leading column in your primary key or unique index definition. 这可以确保拆分或合并验证查询以及实际的数据移动和删除操作(始终在分片键范围上执行)的最佳性能。That ensures the best performance for the split or merge validation queries, and for the actual data movement and deletion operations which always operate on sharding key ranges.
  • 将拆分/合并服务并置在数据库所在的区域和数据中心。Collocate your split-merge service in the region and data center where your databases reside.

其他资源Additional resources

尚未使用弹性数据库工具?Not using elastic database tools yet? 请查看入门指南Check out our Getting Started Guide.