Azure Cosmos DB 中的分区Partitioning in Azure Cosmos DB

Azure Cosmos DB 使用分区缩放数据库中的单个容器,以满足应用程序的性能需求。Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the performance needs of your application. 在分区中,可将容器中的项分割成不同的子集(称作“逻辑分区”)。 In partitioning, the items in a container are divided into distinct subsets called logical partitions. 逻辑分区是根据与容器中每个项关联的分区键值形成的。 Logical partitions are formed based on the value of a partition key that is associated with each item in a container. 逻辑分区中的所有项具有相同的分区键值。All items in a logical partition have the same partition key value.

例如,某个容器保存项。For example, a container holds items. 每个项具有唯一的 UserID 属性值。Each item has a unique value for the UserID property. 如果 UserID 充当容器中的项的分区键,并且有 1,000 个唯一的 UserID 值,则会为容器创建 1,000 个逻辑分区。If UserID serves as the partition key for the items in the container and there are 1,000 unique UserID values, 1,000 logical partitions are created for the container.

除了用于确定项的逻辑分区的分区键以外,容器中的每个项还有一个项 ID(在逻辑分区中保持唯一)。 In addition to a partition key that determines the item's logical partition, each item in a container has an item ID (unique within a logical partition). 将分区键与项 ID 相结合可以创建项的索引用于唯一标识该项。 Combining the partition key and the item ID creates the item's index, which uniquely identifies the item.

分区键的选择非常重要,这会影响应用程序的性能。Choosing a partition key is an important decision that will affect your application's performance.

管理逻辑分区Managing logical partitions

Azure Cosmos DB 以透明方式自动管理逻辑分区在物理分区上的位置,以有效满足容器的可伸缩性和性能需求。Azure Cosmos DB transparently and automatically manages the placement of logical partitions on physical partitions to efficiently satisfy the scalability and performance needs of the container. 随着应用程序的吞吐量和存储要求的提高,Azure Cosmos DB 可移动逻辑分区,以自动在更多的服务器之间分散负载。As the throughput and storage requirements of an application increase, Azure Cosmos DB moves logical partitions to automatically spread the load across a greater number of servers.

Azure Cosmos DB 使用基于哈希的分区在物理分区之间分散逻辑分区。Azure Cosmos DB uses hash-based partitioning to spread logical partitions across physical partitions. Azure Cosmos DB 对项的分区键值进行哈希处理。Azure Cosmos DB hashes the partition key value of an item. 哈希处理结果确定了物理分区。The hashed result determines the physical partition. 然后,Azure Cosmos DB 在物理分区之间均匀分配分区键哈希的键空间。Then, Azure Cosmos DB allocates the key space of partition key hashes evenly across the physical partitions.

与访问多个分区的查询相比,访问单个逻辑分区中的数据的查询更具成本效益。Queries that access data within a single logical partition are more cost-effective than queries that access multiple partitions. 只允许针对单个逻辑分区中的项执行事务(在存储过程或触发器中)。Transactions (in stored procedures or triggers) are allowed only against items in a single logical partition.

若要详细了解 Azure Cosmos DB 如何管理分区,请参阅逻辑分区To learn more about how Azure Cosmos DB manages partitions, see Logical partitions. (生成或运行应用程序不需要了解内部详细信息,添加到这里只是为了方便那些好奇的读者。)(It's not necessary to understand the internal details to build or run your applications, but added here for a curious reader.)

选择分区键Choosing a partition key

以下内容是选择分区键的很好的指南:The following is a good guidance for choosing a partition key:

  • 单个逻辑分区的存储空间上限为 20 GB。A single logical partition has an upper limit of 20 GB of storage.

  • Azure Cosmos 容器的最小吞吐量为每秒 400 个请求单位 (RU/s)。Azure Cosmos containers have a minimum throughput of 400 request units per second (RU/s). 在数据库上预配吞吐量时,每个容器的最小 RU 数为每秒 100 个请求单位(RU/秒)。When throughput is provisioned on a database, minimum RUs per container is 100 request units per second (RU/s). 对同一分区键的请求不能超过分配给某个分区的吞吐量。Requests to the same partition key can't exceed the throughput that's allocated to a partition. 如果请求超过分配的吞吐量,则请求将受到速率限制。If requests exceed the allocated throughput, requests are rate-limited. 请务必选择不会导致应用程序中产生“热点”的分区键。So, it's important to pick a partition key that doesn't result in "hot spots" within your application.

  • 选择具有宽广范围的值,以及在逻辑分区之间均匀分散的访问模式的分区键。Choose a partition key that has a wide range of values and access patterns that are evenly spread across logical partitions. 这有助于在逻辑分区集之间分散容器中的数据和活动,以便可以在逻辑分区之间分配数据存储和吞吐量的资源。This helps spread the data and the activity in your container across the set of logical partitions, so that resources for data storage and throughput can be distributed across the logical partitions.

  • 选择可以持续在所有分区之间均匀分散工作负荷的分区键。Choose a partition key that spreads the workload evenly across all partitions and evenly over time. 所选的分区键应该可以根据在多个分区之间分配项的目标,平衡有效分区查询和事务的需求,以实现可伸缩性。Your choice of partition key should balance the need for efficient partition queries and transactions against the goal of distributing items across multiple partitions to achieve scalability.

  • 分区键的候选项可能包括经常在查询中显示为筛选器的属性。Candidates for partition keys might include properties that appear frequently as a filter in your queries. 通过在筛选器谓词中包含分区键,可以有效地路由查询。Queries can be efficiently routed by including the partition key in the filter predicate.

后续步骤Next steps