使用弹性数据库工具进行多分片查询Multi-shard querying using elastic database tools

适用于:是Azure SQL 数据库 APPLIES TO: yesAzure SQL Database

概述Overview

可以使用弹性数据库工具创建分片数据库解决方案。With the Elastic Database tools, you can create sharded database solutions. 多分片查询用于诸如数据收集/报告等需要跨多个分片运行查询的任务。Multi-shard querying is used for tasks such as data collection/reporting that require running a query that stretches across several shards. (相比之下,数据依赖型路由会在单个分片上执行所有操作。)(Contrast this to data-dependent routing, which performs all work on a single shard.)

  1. 使用 TryGetRangeShardMapJava.NET)、TryGetListShardMapJava.NET)或 GetShardMapJava.NET)方法获取 RangeShardMapJava.NET)或 ListShardMapJava.NET)。Get a RangeShardMap (Java, .NET) or ListShardMap (Java, .NET) using the TryGetRangeShardMap (Java, .NET), the TryGetListShardMap (Java, .NET), or the GetShardMap (Java, .NET) method. 请参阅构造 ShardMapManager获取 RangeShardMap 或 ListShardMapSee Constructing a ShardMapManager and Get a RangeShardMap or ListShardMap.
  2. 创建 MultiShardConnectionJava.NET)对象。Create a MultiShardConnection (Java, .NET) object.
  3. 创建 MultiShardStatement 或 MultiShardCommandJava.NET)。Create a MultiShardStatement or MultiShardCommand (Java, .NET).
  4. 设置 T-SQL 命令的 CommandText 属性Java.NET)。Set the CommandText property (Java, .NET) to a T-SQL command.
  5. 通过调用 ExecuteQueryAsync 或 ExecuteReaderJava.NET)方法执行该命令。Execute the command by calling the ExecuteQueryAsync or ExecuteReader (Java, .NET) method.
  6. 使用 MultiShardResultSet 或 MultiShardDataReaderJava.NET)类查看结果。View the results using the MultiShardResultSet or MultiShardDataReader (Java, .NET) class.

示例Example

以下代码使用给定的 ShardMap (名为 myShardMap)演示多分片查询的用法。The following code illustrates the usage of multi-shard querying using a given ShardMap named myShardMap.

using (MultiShardConnection conn = new MultiShardConnection(myShardMap.GetShards(), myShardConnectionString))
{
    using (MultiShardCommand cmd = conn.CreateCommand())
    {
        cmd.CommandText = "SELECT c1, c2, c3 FROM ShardedTable";
        cmd.CommandType = CommandType.Text;
        cmd.ExecutionOptions = MultiShardExecutionOptions.IncludeShardNameColumn;
        cmd.ExecutionPolicy = MultiShardExecutionPolicy.PartialResults;

        using (MultiShardDataReader sdr = cmd.ExecuteReader())
        {
            while (sdr.Read())
            {
                var c1Field = sdr.GetString(0);
                var c2Field = sdr.GetFieldValue<int>(1);
                var c3Field = sdr.GetFieldValue<Int64>(2);
            }
        }
    }
}

主要区别在于多分片连接的构建。A key difference is the construction of multi-shard connections. 其中,SqlConnection 对单一数据库执行操作,而 MultiShardConnection 则将分片集合用作输入。Where SqlConnection operates on an individual database, the MultiShardConnection takes a collection of shards as its input. 填充分片映射中的分片集合。Populate the collection of shards from a shard map. 然后,使用 UNION ALL 语义组成一个总体结果,在分片集合上执行查询。The query is then executed on the collection of shards using UNION ALL semantics to assemble a single overall result. 或者,也可以在命令上使用 ExecutionOptions 属性,将行所源自的分片的名称添加到输出。Optionally, the name of the shard where the row originates from can be added to the output using the ExecutionOptions property on command.

请注意对 myShardMap.GetShards() 的调用。Note the call to myShardMap.GetShards(). 此方法可从分片映射中检索所有分片,并提供一种轻松方式在所有相关数据库上运行查询。This method retrieves all shards from the shard map and provides an easy way to run a query across all relevant databases. 对通过调用 myShardMap.GetShards() 返回的集合执行 LINQ 查询,进一步优化用于多分片查询的分片集合。The collection of shards for a multi-shard query can be refined further by performing a LINQ query over the collection returned from the call to myShardMap.GetShards(). 多分片查询中的当前功能已随部分结果策略一起被设计为供数十至数百种分片使用。In combination with the partial results policy, the current capability in multi-shard querying has been designed to work well for tens up to hundreds of shards.

多分片查询目前存在的一个限制是,缺少对需要查询的分片和 shardlet 进行验证。A limitation with multi-shard querying is currently the lack of validation for shards and shardlets that are queried. 尽管依赖于数据的路由会在查询时验证给定的分片是否为分片映射的一部分,但多分片查询不会执行此检查。While data-dependent routing verifies that a given shard is part of the shard map at the time of querying, multi-shard queries do not perform this check. 这可能会导致多分片查询在已从分片映射中删除的分片上运行。This can lead to multi-shard queries running on databases that have been removed from the shard map.

多分片查询和拆分/合并操作Multi-shard queries and split-merge operations

多分片查询不会验证查询的数据库上的 shardlet 是否参与正在进行的拆分/合并操作。Multi-shard queries do not verify whether shardlets on the queried database are participating in ongoing split-merge operations. (请参阅使用弹性数据库拆分/合并工具进行缩放。)) 这可能会导致不一致问题,即为同一多分片查询中的多个数据库显示同一 shardlet 中的行。(See Scaling using the Elastic Database split-merge tool.) This can lead to inconsistencies where rows from the same shardlet show for multiple databases in the same multi-shard query. 请注意这些限制并在执行多分片查询时,考虑关闭正在进行的拆分/合并操作以及对分片映射的更改。Be aware of these limitations and consider draining ongoing split-merge operations and changes to the shard map while performing multi-shard queries.

其他资源Additional resources

尚未使用弹性数据库工具?Not using elastic database tools yet? 请查看入门指南Check out our Getting Started Guide.