在 Azure Cosmos DB Cassandra API 中进行分区Partitioning in Azure Cosmos DB Cassandra API

适用于: Cassandra API

本文介绍了 Azure Cosmos DB Cassandra API 中的分区工作原理。This article describes how partitioning works in Azure Cosmos DB Cassandra API.

Cassandra API 使用分区操作来缩放密钥空间中的各个表,以满足应用程序的性能需求。Cassandra API uses partitioning to scale the individual tables in a keyspace to meet the performance needs of your application. 分区是根据与表中的每条记录关联的分区键值形成的。Partitions are formed based on the value of a partition key that is associated with each record in a table. 一个分区中的所有记录都有相同的分区键值。All the records in a partition have the same partition key value. Azure Cosmos DB 以透明方式自动管理分区在物理资源中的位置,以有效满足表的可伸缩性和性能需求。Azure Cosmos DB transparently and automatically manages the placement of partitions across the physical resources to efficiently satisfy the scalability and performance needs of the table. 随着应用程序对吞吐量和存储的要求的提高,Azure Cosmos DB 将在更多的物理计算机之间移动和均衡数据。As the throughput and storage requirements of an application increase, Azure Cosmos DB moves and balances the data across a greater number of physical machines.

从开发人员的角度来看,分区在 Azure Cosmos DB Cassandra API 中的行为与其在原生 Apache Cassandra 中的行为相同。From the developer perspective, partitioning behaves in the same way for Azure Cosmos DB Cassandra API as it does in native Apache Cassandra. 不过,幕后存在一些差异。However, there are some differences behind the scenes.

Apache Cassandra 与 Azure Cosmos DB 之间的差异Differences between Apache Cassandra and Azure Cosmos DB

在 Azure Cosmos DB 中,存储着分区的每台计算机本身称为物理分区In Azure Cosmos DB, each machine on which partitions are stored is itself referred to as a physical partition. 物理分区类似于虚拟机、专用计算单元或物理资源集。The physical partition is akin to a Virtual Machine; a dedicated compute unit, or set of physical resources. 此计算单元上存储的每个分区在 Azure Cosmos DB 中都称为逻辑分区Each partition stored on this compute unit is referred to as a logical partition in Azure Cosmos DB. 如果你已熟悉 Apache Cassandra,则可以像看待 Cassandra 中的常规分区那样看待逻辑分区。If you are already familiar with Apache Cassandra, you can think of logical partitions in the same way that you think of regular partitions in Cassandra.

Apache Cassandra 建议你为可以存储在分区中的数据大小设置 100 MB 的限制。Apache Cassandra recommends a 100-MB limit on the size of a data that can be stored in a partition. Azure Cosmos DB 的 Cassandra API 允许每个逻辑分区最多具有 20 GB 的数据,每个物理分区最多具有 30 GB 的数据。The Cassandra API for Azure Cosmos DB allows up to 20 GB per logical partition, and up to 30GB of data per physical partition. 在 Azure Cosmos DB 中,与 Apache Cassandra 不同,物理分区中可用的计算容量使用称为请求单位的单个指标来表示,这允许你按每秒请求数(读取或写入数)而不是按核心数、内存或 IOPS 来考虑你的工作负荷。In Azure Cosmos DB, unlike Apache Cassandra, compute capacity available in the physical partition is expressed using a single metric called request units, which allows you to think of your workload in terms of requests (reads or writes) per second, rather than cores, memory, or IOPS. 在你了解每个请求的成本后,这可以使容量规划更加直接。This can make capacity planning more straight forward, once you understand the cost of each request. 每个物理分区最多可以有 10000 RU 的计算能力供其使用。Each physical partition can have up to 10000 RUs of compute available to it. 若要详细了解可伸缩性选项,可阅读有关 Cassandra API 中的弹性缩放的文章。You can learn more about scalability options by reading our article on elastic scale in Cassandra API.

在 Azure Cosmos DB 中,每个物理分区都由一组副本(也称为副本集)组成,每个分区至少有 4 个副本。In Azure Cosmos DB, each physical partition consists of a set of replicas, also known as replica sets, with at least 4 replicas per partition. 这与 Apache Cassandra 相反,后者可以将复制因子设置为 1。This is in contrast to Apache Cassandra, where setting a replication factor of 1 is possible. 但是,如果包含数据的唯一节点出现故障,这会导致低可用性。However, this leads to low availability if the only node with the data goes down. 在 Cassandra API 中,复制因子始终为 4(仲裁为 3)。In Cassandra API there is always a replication factor of 4 (quorum of 3). Azure Cosmos DB 自动管理副本集,而在 Apache Cassandra 中则需要使用各种工具来维护副本集。Azure Cosmos DB automatically manages replica sets, while these need to be maintained using various tools in Apache Cassandra.

Apache Cassandra 有令牌的概念,令牌是分区键的哈希。Apache Cassandra has a concept of tokens, which are hashes of partition keys. 令牌基于 murmur3 64 字节哈希,其值的范围为 -2^63 到 -2^63 - 1。The tokens are based on a murmur3 64 byte hash, with values ranging from -2^63 to -2^63 - 1. 在 Apache Cassandra 中,此范围通常称为“令牌环”。This range is commonly referred to as the "token ring" in Apache Cassandra. 令牌环分布到令牌范围内,这些范围是在原生 Apache Cassandra 群集中的节点之间划分的。The token ring is distributed into token ranges, and these ranges are divided amongst the nodes present in a native Apache Cassandra cluster. Azure Cosmos DB 的分区以类似方式实现,不过它使用不同的哈希算法,并且有更大的内部令牌环。Partitioning for Azure Cosmos DB is implemented in a similar way, except it uses a different hash algorithm, and has a larger internal token ring. 但是,在外部,我们公开与 Apache Cassandra 相同的令牌范围,即 -2^63 到 -2^63 - 1。However, externally we expose the same token range as Apache Cassandra, i.e., -2^63 to -2^63 - 1.

主密钥Primary key

Cassandra API 中的所有表都必须定义一个 primary keyAll tables in Cassandra API must have a primary key defined. 主键的语法如下所示:The syntax for a primary key is shown below:

column_name cql_type_definition PRIMARY KEY

假设我们想要创建一个用户表,用于存储不同用户的消息:Suppose we want to create a user table, which stores messages for different users:

CREATE TABLE uprofile.user ( 
   user text,  
   message text);

在此设计中,我们将 id 字段定义为主键。In this design, we have defined the id field as the primary key. 主键用作表中记录的标识符,也用作 Azure Cosmos DB 中的分区键。The primary key functions as the identifier for the record in the table and it is also used as the partition key in Azure Cosmos DB. 如果使用前述方法定义主键,则每个分区中将只有一条记录。If the primary key is defined in the previously described way, there will only be a single record in each partition. 向数据库写入数据时,这会导致完全水平的和可缩放的分布,因此非常适用于键-值查找用例。This will result in a perfectly horizontal and scalable distribution when writing data to the database, and is ideal for key-value lookup use cases. 应用程序每次从表中读取数据时都应提供主键,以最程度地提高读取性能。The application should provide the primary key whenever reading data from the table, to maximize read performance.


复合主键Compound primary key

Apache Cassandra 还有 compound keys 的概念。Apache Cassandra also has a concept of compound keys. 复合 primary key 包含多个列;第一列是 partition key,任何其他列都是 clustering keysA compound primary key consists of more than one column; the first column is the partition key, and any additional columns are the clustering keys. compound primary key 的语法如下所示:The syntax for a compound primary key is shown below:

PRIMARY KEY (partition_key_column_name, clustering_column_name [, ...])

假设我们要更改上述设计,并使其能够高效地检索给定用户的消息:Suppose we want to change the above design and make it possible to efficiently retrieve messages for a given user:

CREATE TABLE uprofile.user (
   user text,  
   id int, 
   message text, 
   PRIMARY KEY (user, id));

在此设计中,我们现在将 user 定义为分区键,并将 id 定义为聚类分析键。In this design, we are now defining user as the partition key, and id as the clustering key. 你可以根据需要定义任意数量的聚类分析键,但聚类分析键的每个值(或值的组合)必须是独一无二的,否则无法生成添加到同一分区的多条记录,例如:You can define as many clustering keys as you wish, but each value (or a combination of values) for the clustering key must be unique in order to result in multiple records being added to the same partition, for example:

insert into uprofile.user (user, id, message) values ('theo', 1, 'hello');
insert into uprofile.user (user, id, message) values ('theo', 2, 'hello again');

当返回数据时,数据将按聚类分析键进行排序,如 Apache Cassandra 中预期的那样:When data is returned, it is sorted by the clustering key, as expected in Apache Cassandra:


对于以此方式建模的数据,可以将多条记录分配给每个分区,并按用户分组。With data modeled in this way, multiple records can be assigned to each partition, grouped by user. 因此,我们可以发出按 partition key(在本例中为 user)进行高效路由的查询,以获取给定用户的所有消息。We can thus issue a query that is efficiently routed by the partition key (in this case, user) to get all the messages for a given user.


复合分区键Composite partition key

复合分区键的工作方式实质上与复合键相同,不同之处在于你可以将多个列指定为复合分区键。Composite partition keys work essentially the same way as compound keys, except that you can specify multiple columns as a composite partition key. 复合分区键的语法如下所示:The syntax of composite partition keys is shown below:

   (partition_key_column_name[, ...]), 
    clustering_column_name [, ...]);

例如,可以使用以下方法,其中 firstnamelastname 的唯一组合将形成分区键,id 为聚类分析键:For example, you can have the following, where the unique combination of firstname and lastname would form the partition key, and id is the clustering key:

CREATE TABLE uprofile.user ( 
   firstname text, 
   lastname text,
   id int,  
   message text, 
   PRIMARY KEY ((firstname, lastname), id) );

后续步骤Next steps