Azure Cosmos DB 中的分区Partitioning in Azure Cosmos DB

Azure Cosmos DB 使用分区缩放数据库中的单个容器,以满足应用程序的性能需求。Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the performance needs of your application. 在分区中,可将容器中的项分割成不同的子集(称作“逻辑分区”)。In partitioning, the items in a container are divided into distinct subsets called logical partitions. 逻辑分区是根据与容器中每个项关联的分区键值形成的。Logical partitions are formed based on the value of a partition key that is associated with each item in a container. 逻辑分区中的所有项具有相同的分区键值。All items in a logical partition have the same partition key value.

例如,某个容器保存项。For example, a container holds items. 每个项具有唯一的 UserID 属性值。Each item has a unique value for the UserID property. 如果 UserID 充当容器中的项的分区键,并且有 1,000 个唯一的 UserID 值,则会为容器创建 1,000 个逻辑分区。If UserID serves as the partition key for the items in the container and there are 1,000 unique UserID values, 1,000 logical partitions are created for the container.

除了用于确定项的逻辑分区的分区键以外,容器中的每个项还有一个项 ID(在逻辑分区中保持唯一)。In addition to a partition key that determines the item's logical partition, each item in a container has an item ID (unique within a logical partition). 将分区键与项 ID 相结合可创建项的索引,用来唯一地标识该项。 Combining the partition key and the item ID creates the item's index, which uniquely identifies the item.

分区键的选择非常重要,这会影响应用程序的性能。Choosing a partition key is an important decision that will affect your application's performance.

管理逻辑分区Managing logical partitions

Azure Cosmos DB 以透明方式自动管理逻辑分区在物理分区上的位置,以有效满足容器的可伸缩性和性能需求。Azure Cosmos DB transparently and automatically manages the placement of logical partitions on physical partitions to efficiently satisfy the scalability and performance needs of the container. 随着应用程序的吞吐量和存储要求的提高,Azure Cosmos DB 会移动逻辑分区,以便自动在更多的物理分区之间分散负载。As the throughput and storage requirements of an application increase, Azure Cosmos DB moves logical partitions to automatically spread the load across a greater number of physical partitions. 可以详细了解物理分区You can learn more about physical partitions.

Azure Cosmos DB 使用基于哈希的分区在物理分区之间分散逻辑分区。Azure Cosmos DB uses hash-based partitioning to spread logical partitions across physical partitions. Azure Cosmos DB 对项的分区键值进行哈希处理。Azure Cosmos DB hashes the partition key value of an item. 哈希处理结果确定了物理分区。The hashed result determines the physical partition. 然后,Azure Cosmos DB 在物理分区之间均匀分配分区键哈希的键空间。Then, Azure Cosmos DB allocates the key space of partition key hashes evenly across the physical partitions.

只允许针对单个逻辑分区中的项执行事务(在存储过程或触发器中)。Transactions (in stored procedures or triggers) are allowed only against items in a single logical partition.

可以详细了解 Azure Cosmos DB 如何管理分区You can learn more about how Azure Cosmos DB manages partitions. (生成或运行应用程序不需要了解内部详细信息,添加到这里只是为了方便那些好奇的读者。)(It's not necessary to understand the internal details to build or run your applications, but added here for a curious reader.)

选择分区键Choosing a partition key

选择分区键是 Azure Cosmos DB 中的一个简单但重要的设计选择。Selecting your partition key is a simple but important design choice in Azure Cosmos DB. 选择分区键后,将无法就地进行更改。Once you select your partition key, it is not possible to change it in-place. 如果需要更改分区键,应将数据移动到带有所需新分区键的新容器。If you need to change your partition key, you should move your data to a new container with your new desired partition key.

对于所有容器,分区键应当:For all containers, your partition key should:

  • 是一个属性,并且其值不会更改。Be a property that has a value which does not change. 如果某个属性是分区键,那么你不能更新该属性的值。If a property is your partition key, you can't update that property's value.
  • 具有较高的基数。Have a high cardinality. 换言之,该属性应具有范围广泛的可能值。In other words, the property should have a wide range of possible values.
  • 将请求单位 (RU) 消耗和数据存储均匀分配到所有逻辑分区上。Spread request unit (RU) consumption and data storage evenly across all logical partitions. 这可确保跨物理分区均匀分配 RU 消耗和存储。This ensures even RU consumption and storage distribution across your physical partitions.

如果在 Azure Cosmos DB 中需要多项 ACID 事务,则需要使用存储过程或触发器If you need multi-item ACID transactions in Azure Cosmos DB, you will need to use stored procedures or triggers. 所有基于 JavaScript 的存储过程和触发器的作用域都是单个逻辑分区。All JavaScript-based stored procedures and triggers are scoped to a single logical partition.

读取密集型容器的分区键Partition keys for read-heavy containers

对于大多数容器,上述条件就是在选择分区键时需要考虑的全部。For most containers, the above criteria is all you need to consider when picking a partition key. 但对于较大的读取密集型容器,可能需要选择在查询中经常作为筛选器出现的分区键。For large read-heavy containers, however, you might want to choose a partition key that appears frequently as a filter in your queries. 通过在筛选器谓词中包含分区键,查询可以高效地专门路由到相关的物理分区Queries can be efficiently routed to only the relevant physical partitions by including the partition key in the filter predicate.

如果大多数工作负荷请求是查询,并且大多数查询在同一属性上都有一个等式筛选器,则此属性可以成为不错的分区键选择。If most of your workload's requests are queries and most of your queries have an equality filter on the same property, this property can be a good partition key choice. 例如,如果经常运行在 UserID 上筛选的查询,则选择 UserID 作为分区键将减少跨分区查询的数目。For example, if you frequently run a query that filters on UserID, then selecting UserID as the partition key would reduce the number of cross-partition queries.

但是,如果容器很小,那么你的物理分区数量可能并非很多,无需你担心跨分区查询的性能影响。However, if your container is small, you probably don't have enough physical partitions to need to worry about the performance impact of cross-partition queries. Azure Cosmos DB 中的大多数小容器只需要一个或两个物理分区。Most small containers in Azure Cosmos DB only require one or two physical partitions.

如果容器可能会增长到许多个物理分区,则应确保选择一个可以最大程度地减少跨分区查询的分区键。If your container could grow to more than a few physical partitions, then you should make sure you pick a partition key that minimizes cross-partition queries. 如果满足以下任一条件,则容器将需要许多个物理分区:Your container will require more than a few physical partitions when either of the following are true:

  • 容器已预配了 30000 以上的 RUYour container will have over 30,000 RU's provisioned
  • 容器将存储超过 100 GB 的数据Your container will store over 100 GB of data

使用项 ID 作为分区键Using item ID as the partition key

如果容器具有一个属性,并且该属性的可能值范围十分广泛,则该属性很可能是非常好的分区键选择。If your container has a property that has a wide range of possible values, it is likely a great partition key choice. 此类属性的一个可能示例是项 ID。One possible example of such a property is the item ID. 对于较小的读取密集型容器或任意大小的写入密集型容器,项 ID 自然是很好的分区键选择。For small read-heavy containers or write-heavy containers of any size, the item ID is naturally a great choice for the partition key.

系统属性“项 ID”保证存在于 Cosmos 容器中的每一项内。The system property item ID is guaranteed to exist in every item in your Cosmos container. 可能会有其他用于表示项逻辑 ID 的属性。You may have other properties that represent a logical ID of your item. 在许多情况下,出于与项 ID 相同的原因,这些属性也会是非常好的分区键选择。In many cases, these are also great partition key choices for the same reasons as the item ID.

项 ID 是很好的分区键选择,原因如下:The item ID is a great partition key choice for the following reasons:

  • 其可能值范围十分广泛(每个项一个唯一的项 ID)。There are a wide range of possible values (one unique item ID per item).
  • 由于每个项都有一个唯一的项 ID,因此,项 ID 在均衡 RU 消耗和数据存储方面有显著作用。Because there is a unique item ID per item, the item ID does a great job at evenly balancing RU consumption and data storage.
  • 你可以轻松执行高效的点读取,因为如果你知道项的项 ID,你将始终知道项的分区键。You can easily do efficient point reads since you'll always know an item's partition key if you know its item ID.

选择项 ID 作为分区键时要考虑的一些事项包括:Some things to consider when selecting the item ID as the partition key include:

  • 如果项 ID 为分区键,则它会成为整个容器中的唯一标识符。If the item ID is the partition key, it will become a unique identifier throughout your entire container. 不同的项不能具有相同的项 ID。You won't be able to have items that have a duplicate item ID.
  • 如果一个读取密集型容器有大量物理分区,则当查询具有一个包含项 ID 的等式筛选器时,查询将更高效。If you have a read-heavy container that has a lot of physical partitions, queries will be more efficient if they have an equality filter with the item ID.
  • 不能跨多个逻辑分区运行存储过程或触发器。You can't run stored procedures or triggers across multiple logical partitions.

后续步骤Next steps