查询 Azure Cosmos 容器Query an Azure Cosmos container

本文介绍如何在 Azure Cosmos DB 中查询容器(集合、图形或表)。This article explains how to query a container (collection, graph, or table) in Azure Cosmos DB. 具体而言,它介绍了分区中查询和跨分区查询在 Azure Cosmos DB 中如何工作。In particular, it covers how in-partition and cross-partition queries work in Azure Cosmos DB.

分区中查询In-partition query

从容器中查询数据时,如果查询指定了分区键筛选器,则 Azure Cosmos DB 会自动优化查询。When you query data from containers, if the query has a partition key filter specified, Azure Cosmos DB automatically optimizes the query. 它会将查询路由到筛选器中指定的分区键值所对应的物理分区It routes the query to the physical partitions corresponding to the partition key values specified in the filter.

例如,请考虑以下针对 DeviceId 使用等式筛选器的查询。For example, consider the below query with an equality filter on DeviceId. 如果对按 DeviceId 分区的容器运行此查询,则此查询将筛选到单个物理分区。If we run this query on a container partitioned on DeviceId, this query will filter to a single physical partition.

    SELECT * FROM c WHERE c.DeviceId = 'XMS-0001'

与前面的示例一样,此查询也将筛选到单个分区。As with the earlier example, this query will also filter to a single partition. 添加针对 Location 的其他筛选器不会更改此行为:Adding the additional filter on Location does not change this:

    SELECT * FROM c WHERE c.DeviceId = 'XMS-0001' AND c.Location = 'Seattle'

下面的查询有一个针对分区键的范围筛选器,该查询的作用域不会限定于单个物理分区。Here's a query that has a range filter on the partition key and won't be scoped to a single physical partition. 为了成为分区中查询,该查询必须具有包含分区键的等式筛选器:In order to be an in-partition query, the query must have an equality filter that includes the partition key:

    SELECT * FROM c WHERE c.DeviceId > 'XMS-0001'

跨分区查询Cross-partition query

下面的查询没有针对分区键 (DeviceId) 的筛选器。The following query doesn't have a filter on the partition key (DeviceId). 因此,它必须根据每个分区的索引扇出到运行它的所有物理分区:Therefore, it must fan-out to all physical partitions where it is run against each partition's index:

    SELECT * FROM c WHERE c.Location = 'Seattle`

每个物理分区都有其自己的索引。Each physical partition has its own index. 因此,当你在容器上运行跨分区查询时,你可以高效地针对每个物理分区运行一个查询。Therefore, when you run a cross-partition query on a container, you are effectively running one query per physical partition. Azure Cosmos DB 会自动聚合不同物理分区的结果。Azure Cosmos DB will automatically aggregate results across different physical partitions.

不同物理分区中的索引彼此独立。The indexes in different physical partitions are independent from one another. Azure Cosmos DB 中不存在多区域索引。There is no multiple-regional index in Azure Cosmos DB.

并行跨分区查询Parallel cross-partition query

Azure Cosmos DB SDK 1.9.0 及更高版本支持并行查询执行选项。The Azure Cosmos DB SDKs 1.9.0 and later support parallel query execution options. 并行跨分区查询可用于执行低延迟、跨分区查询。Parallel cross-partition queries allow you to perform low latency, cross-partition queries.

可以通过调整以下参数来管理并行查询执行:You can manage parallel query execution by tuning the following parameters:

  • MaxConcurrency:设置容器分区的最大并发网络连接数。MaxConcurrency: Sets the maximum number of simultaneous network connections to the container's partitions. 如果将此属性设置为 -1,则由 SDK 管理并行度。If you set this property to -1, the SDK manages the degree of parallelism. 如果  MaxConcurrency 设置为 0,则与容器的分区之间存在单个网络连接。If the MaxConcurrency set to 0, there is a single network connection to the container's partitions.

  • MaxBufferedItemCount:权衡查询延迟与客户端内存利用率。MaxBufferedItemCount: Trades query latency versus client-side memory utilization. 如果省略此选项或将其设置为 -1,则由 SDK 管理并行查询执行过程中缓冲的项目数。If this option is omitted or to set to -1, the SDK manages the number of items buffered during parallel query execution.

由于 Azure Cosmos DB 能够并行执行跨分区查询,因此,随着系统增加物理分区,查询延迟通常增加得不多。Because of the Azure Cosmos DB's ability to parallelize cross-partition queries, query latency will generally scale well as the system adds physical partitions. 但是,随着物理分区总数的增加,RU 开销会明显增大。However, RU charge will increase significantly as the total number of physical partitions increases.

运行跨分区查询时,实质上是对每个物理分区执行单独的查询。When you run a cross-partition query, you are essentially doing a separate query per individual physical partition. 尽管跨分区查询将使用索引(如果可用),但它们仍然不如分区中查询高效。While cross-partition queries queries will use the index, if available, they are still not nearly as efficient as in-partition queries.

有用的示例Useful example

下面是一个类比,可帮助你更好地理解跨分区查询:Here's an analogy to better understand cross-partition queries:

假设你是一个送货司机,必须把包裹送到不同的公寓楼。Let's imagine you are a delivery driver that has to deliver packages to different apartment complexes. 每栋公寓楼都有一个列表,上面包含所有住户的单元号。Each apartment complex has a list on the premises that has all of the resident's unit numbers. 我们可以将每栋公寓楼与物理分区进行比较,并将每个列表与物理分区的索引进行比较。We can compare each apartment complex to a physical partition and each list to the physical partition's index.

我们可以使用以下示例来比较分区中查询和跨分区查询:We can compare in-partition and cross-partition queries using this example:

分区中查询In-partition query

如果送货司机知道正确的公寓楼(物理分区),那么他们可以立即开车到正确的大楼。If the delivery driver knows the correct apartment complex (physical partition), then they can immediately drive to the correct building. 司机可以查看公寓楼的住户单元号列表(索引),并迅速送达相应的包裹。The driver can check the apartment complex's list of the resident's unit numbers (the index) and quickly deliver the appropriate packages. 在这种情况下,司机不需要浪费任何时间或精力开车到公寓楼去查看是否有包裹接收者住在那里。In this case, the driver does not waste any time or effort driving to an apartment complex to check and see if any package recipients live there.

跨分区查询(扇出)Cross-partition query (fan-out)

如果送货司机不知道正确的公寓楼(物理分区),则他们需要开车到每一栋公寓楼,并检查列表上所有住户的单元号(索引)。If the delivery driver does not know the correct apartment complex (physical partition), they'll need to drive to every single apartment building and check the list with all of the resident's unit numbers (the index). 在他们到达每个公寓楼后,他们仍然可以使用每个住户的地址列表。Once they arrive at each apartment complex, they'll still be able to use the list of the addresses of each resident. 然而,他们将需要检查每个公寓楼的列表,无论是否有包裹接收者住在那里。However, they will need to check every apartment complex's list, whether any package recipients live there or not. 这是跨分区查询的工作方式。This is how cross-partition queries work. 虽然他们可以使用索引(不需要挨户敲门),但他们必须分别检查每个物理分区的索引。While they can use the index (don't need to knock on every single door), they must separately check the index for every physical partition.

跨分区查询(仅限于几个物理分区)Cross-partition query (scoped to only a few physical partitions)

如果送货司机知道所有的包裹接收者都住在某几个公寓楼里,他们就不需要开车去每个公寓楼。If the delivery driver knows that all package recipients live within a certain few apartment complexes, they won't need to drive to every single one. 虽然开车去几处公寓楼需要做的工作仍然比访问一栋大楼要多,但是送货司机仍然可以节省大量的时间和精力。While driving to a few apartment complexes will still require more work than visiting just a single building, the delivery driver still saves significant time and effort. 如果查询的筛选器中包含分区键和 IN 关键字,则它仅检查相关物理分区的索引中是否有所需数据。If a query has the partition key in its filter with the IN keyword, it will only check the relevant physical partition's indexes for data.

避免跨分区查询Avoiding cross-partition queries

对于大多数容器,不可避免地会使用一些跨分区查询。For most containers, it's inevitable that you will have some cross-partition queries. 使用一些跨分区查询是可以的!Having some cross-partition queries is ok! 几乎所有查询操作都支持跨分区(包括逻辑分区键和物理分区)。Nearly all query operations are supported across partitions (both logical partition keys and physical partitions). 为了跨物理分区并行执行查询,Azure Cosmos DB 在查询引擎和客户端 SDK 中也进行了许多优化。Azure Cosmos DB also has many optimizations in the query engine and client SDKs to parallelize query execution across physical partitions.

对于大多数读取繁忙的场景,我们建议你直接在查询筛选器中选择最常用的属性。For most read-heavy scenarios, we recommend simply selecting the most common property in your query filters. 你还应该确保分区键遵循其他分区键选择最佳做法You should also make sure your partition key adheres to other partition key selection best practices.

避免跨分区查询通常只对大型容器很重要。Avoiding cross-partition queries typically only matters with large containers. 每次检查物理分区的索引来获取结果时,至少需要大约 2.5 RU 的开销,即使物理分区中没有项与查询的筛选器匹配。You are charged a minimum of about 2.5 RU's each time you check a physical partition's index for results, even if no items in the physical partition match the query's filter. 因此,如果只有一个(或几个)物理分区,则跨分区查询消耗的 RU 不会显著高于分区中查询。As such, if you have only one (or just a few) physical partitions, cross-partition queries will not consume significantly more RU's than in-partition queries.

物理分区的数量与已预配 RU 的数量相关。The number of physical partitions is tied to the amount of provisioned RU's. 每个物理分区允许最多 10,000 个预配 RU,并且最多可以存储 50 GB 的数据。Each physical partition allows for up to 10,000 provisioned RU's and can store up to 50 GB of data. Azure Cosmos DB 会自动为你管理物理分区。Azure Cosmos DB will automatically manage physical partitions for you. 容器中的物理分区数取决于预配的吞吐量和消耗的存储。The number of physical partitions in your container is dependent on your provisioned throughput and consumed storage.

如果工作负荷满足以下条件,则应尝试避免使用跨分区查询:You should try to avoid cross-partition queries if your workload meets the criteria below:

  • 你计划预配 30,000 个以上的 RUYou plan to have over 30,000 RU's provisioned
  • 你计划存储超过 100 GB 的数据You plan to store over 100 GB of data

后续步骤Next steps

请参阅以下文章,了解 Azure Cosmos DB 中的分区:See the following articles to learn about partitioning in Azure Cosmos DB: