Azure Cosmos DB 中的分区和水平缩放Partitioning and horizontal scaling in Azure Cosmos DB

本文介绍了逻辑分区与物理分区之间的关系。This article explains the relationship between logical and physical partitions. 还讨论了用于分区的最佳做法,并且深入介绍了横向缩放在 Azure Cosmos DB 中的工作方式。It also discusses best practices for partitioning and gives an in-depth view at how horizontal scaling works in Azure Cosmos DB. 并非一定要了解这些内部详细信息才能选择分区键,但我们还是介绍了这些内容,以便清晰说明 Azure Cosmos DB 的工作方式。It's not necessary to understand these internal details to select your partition key but we have covered them so you have clarity for how Azure Cosmos DB works.

逻辑分区Logical partitions

逻辑分区由一组具有相同分区键的项构成。A logical partition consists of a set of items that have the same partition key. 例如,在包含食物营养相关数据的容器中,所有项都包含 foodGroup 属性。For example, in a container that contains data about food nutrition, all items contain a foodGroup property. 可以使用 foodGroup 作为该容器的分区键。You can use foodGroup as the partition key for the container. 具有特定 foodGroup 值(例如 Beef ProductsBaked ProductsSausages and Luncheon Meats)的项组构成了独立的逻辑分区。Groups of items that have specific values for foodGroup, such as Beef Products,Baked Products, and Sausages and Luncheon Meats, form distinct logical partitions. 无需担心在删除基础数据时是否会删除逻辑分区。You don't have to worry about deleting a logical partition when the underlying data is deleted.

逻辑分区也定义数据库事务的范围。A logical partition also defines the scope of database transactions. 可以使用支持快照隔离的事务来更新逻辑分区中的项。You can update items within a logical partition by using a transaction with snapshot isolation. 当向容器中添加新项时,系统将透明地创建新的逻辑分区。When new items are added to a container, new logical partitions are transparently created by the system.

容器中逻辑分区的数量是没有限制的。There is no limit to the number of logical partitions in your container. 每个逻辑分区最多可以存储 20GB 数据。Each logical partition can store up to 20GB of data. 如果分区键的可能值范围广泛,那么这些分区键是良好的分区键选择。Good partition key choices have a wide range of possible values. 例如,在一个其中所有项都包含 foodGroup 属性的容器中,Beef Products 逻辑分区内的数据最多可能会增长到 20GB。For example, in a container where all items contain a foodGroupproperty, the data within the Beef Products logical partition can grow up to 20GB. 选择具有多种可能值的分区键会确保容器能够缩放。Selecting a partition key with a wide range of possible values ensures that the container is able to scale.

物理分区Physical partitions

Azure Cosmos 容器是通过在物理分区间分配数据和吞吐量来缩放的。An Azure Cosmos container is scaled by distributing data and throughput across physical partitions. 在内部,一个或多个逻辑分区映射到一个物理分区。Internally, one or more logical partitions are mapped to a single physical partition. 大多数小型 Cosmos 容器都会有许多逻辑分区,但只需要一个物理分区。Most small Cosmos containers have many logical partitions but only require a single physical partition. 与逻辑分区不同,物理分区是系统的内部实现,并且全部由 Azure Cosmos DB 管理。Unlike logical partitions, physical partitions are an internal implementation of the system and they are entirely managed by Azure Cosmos DB.

Cosmos 容器中的物理分区数依赖于以下各项:The number of physical partitions in your Cosmos container depends on the following:

  • 预配的吞吐量(每个单独的物理分区最多可以提供每秒 10,000 个请求单位的吞吐量)Amount of provisioned throughput (each individual physical partition can provide a throughput of up to 10,000 request units per second)
  • 总数据存储量(每个单独的物理分区最多可以存储 50GB)Total data storage (each individual physical partition can store up to 50GB)

容器中物理分区的总数是没有限制的。There is no limit to the total number of physical partitions in your container. 随着预配的吞吐量或数据量规模的增长,Azure Cosmos DB 将会通过拆分现有物理分区来自动创建新物理分区。As your provisioned throughput or data size grows, Azure Cosmos DB will automatically create new physical partitions by splitting existing ones. 物理分区拆分不影响应用程序可用性。Physical partition splits do not impact your application's availability. 物理分区拆分后,单个逻辑分区内的所有数据仍将存储在同一个物理分区中。After the physical partition split, all data within a single logical partition will still be stored on the same physical partition. 物理分区拆分只是创建逻辑分区到物理分区的新映射。A physical partition split simply creates a new mapping of logical partitions to physical partitions.

为容器预配的吞吐量在物理分区之间均匀划分。Throughput provisioned for a container is divided evenly among physical partitions. 不会均匀分配吞吐量请求的分区键设计可能会产生“热”分区。A partition key design that doesn't distribute the throughput requests evenly might create "hot" partitions. 热分区可能导致速率限制、预配吞吐量的低效使用,以及更高的成本。Hot partitions might result in rate-limiting and in inefficient use of the provisioned throughput, and higher costs.

在 Azure 门户的“指标”边栏选项卡的“存储”部分中,可以看到容器的物理分区 :You can see your container's physical partitions in the Storage section of the Metrics blade of the Azure portal:

查看物理分区数

在此示例容器中,已选择 /foodGroup 作为分区键,三个矩形中的每一个都表示一个物理分区。In this example container where we have chosen /foodGroup as our partition key, each of the three rectangles represents a physical partition. 在此图中,分区键范围与物理分区相同。In the image, partition key range is the same as a physical partition. 选定的物理分区包含三个逻辑分区:Beef ProductsVegetable and Vegetable ProductsSoups, Sauces, and GraviesThe selected physical partition contains three logical partitions: Beef Products, Vegetable and Vegetable Products, and Soups, Sauces, and Gravies.

如果预配每秒 18,000 个请求单位 (RU/s) 的吞吐量,则三个物理分区中的每一个都可以利用总预配吞吐量的 1/3。If we provision a throughput of 18,000 request units per second (RU/s), then each of the three physical partition can utilize 1/3 of the total provisioned throughput. 在选定的物理分区中,逻辑分区键 Beef ProductsVegetable and Vegetable ProductsSoups, Sauces, and Gravies 可以共同利用为物理分区预配的每秒 6,000 个 RU。Within the selected physical partition, the logical partition keys Beef Products, Vegetable and Vegetable Products, and Soups, Sauces, and Gravies can, collectively, utilize the physical partition's 6,000 provisioned RU/s. 由于预配的吞吐量是在容器的物理分区间平均分配的,因此,请务必通过选择正确的逻辑分区键来选择平均分配吞吐量消耗的分区键。Because provisioned throughput is evenly divided across your container's physical partitions, it's important to choose a partition key that evenly distributes throughput consumption by choosing the right logical partition key. 如果选择在逻辑分区间平均分配吞吐量消耗的分区键,将会确保物理分区间的吞吐量消耗保持均衡。If you choose a partition key that evenly distributes throughput consumption across logical partitions, you will ensure that throughput consumption across physical partitions is balanced.

副本集Replica sets

每个物理分区都包含一组副本(也称为副本集)。Each physical partition consists of a set of replicas, also referred to as a replica set. 每个副本集托管 Azure Cosmos 数据库引擎的一个实例。Each replica set hosts an instance of the Azure Cosmos database engine. 副本集使物理分区中存储的数据具有持久性、高可用性和一致性。A replica set makes the data stored within the physical partition durable, highly available, and consistent. 构成物理分区的每个副本均继承该分区的存储配额。Each replica that makes up the physical partition inherits the partition's storage quota. 物理分区的所有副本共同支持分配给物理分区的吞吐量。All replicas of a physical partition collectively support the throughput that's allocated to the physical partition. Azure Cosmos DB 自动管理副本集。Azure Cosmos DB automatically manages replica sets.

大多数小型 Cosmos 容器只需要一个物理分区,但仍将至少具有 4 个副本。Most small Cosmos containers only require a single physical partition but will still have at least 4 replicas.

下图显示了逻辑分区如何映射到多区域分布的物理分区:The following image shows how logical partitions are mapped to physical partitions that are distributed multiple-regionally:

演示 Azure Cosmos DB 分区的插图

后续步骤Next steps