创建合成分区键Create a synthetic partition key

采用具有大量(例如,几百甚至几千个)非重复性值的分区键是最佳做法。It's the best practice to have a partition key with many distinct values, such as hundreds or thousands. 目标是在与这些分区键值关联的项之间均匀分配数据和工作负荷。The goal is to distribute your data and workload evenly across the items associated with these partition key values. 如果数据中不存在此类属性,可以构造一个合成分区键。 If such a property doesn't exist in your data, you can construct a synthetic partition key. 本文档介绍为 Cosmos 容器生成合成分区键的几种基本方法。This document describes several basic techniques for generating a synthetic partition key for your Cosmos container.

连接某个项的多个属性Concatenate multiple properties of an item

将多个属性值连接成单个人工 partitionKey 属性可以构成一个分区键。You can form a partition key by concatenating multiple property values into a single artificial partitionKey property. 这些键称为合成键。These keys are referred to as synthetic keys. 例如,考虑以下示例文档:For example, consider the following example document:

{
"deviceId": "abc-123",
"date": 2018
}

对于上面的文档,一种做法是将 /deviceId 或 /date 设为分区键。For the previous document, one option is to set /deviceId or /date as the partition key. 若要根据设备 ID 或日期将容器分区,可使用这种做法。Use this option, if you want to partition your container based on either device ID or date. 另一种做法是将这两个值连接成用作分区键的合成 partitionKey 属性。Another option is to concatenate these two values into a synthetic partitionKey property that's used as the partition key.

{
"deviceId": "abc-123",
"date": 2018,
"partitionKey": "abc-123-2018"
}

在实时方案中,数据库可能包含数千个项。In real-time scenarios, you can have thousands of items in a database. 请不要手动添加合成键,而应该定义客户端逻辑来连接值,并将合成键插入 Cosmos 容器的项中。Instead of adding the synthetic key manually, define client-side logic to concatenate values and insert the synthetic key into the items in your Cosmos containers.

使用具有随机后缀的分区键Use a partition key with a random suffix

另一种更均匀地分配工作负荷的可行策略是,在分区键值的末尾追加一个随机数。Another possible strategy to distribute the workload more evenly is to append a random number at the end of the partition key value. 以这种方式分配项时,可以跨分区执行并行写入操作。When you distribute items in this way, you can perform parallel write operations across partitions.

例如,如果某个分区键表示日期,An example is if a partition key represents a date. 则你可以选择介于 1 和 400 之间的随机数,并将其作为后缀连接到该日期。You might choose a random number between 1 and 400 and concatenate it as a suffix to the date. 此方法会生成如下所示的分区键值: 2018-08-09.12018-08-09.2 ...  2018-08-09.400This method results in partition key values like 2018-08-09.1,2018-08-09.2, and so on, through 2018-08-09.400. 由于随机化了分区键,每日针对容器执行的写入操作将均匀分散在多个分区之间。Because you randomize the partition key, the write operations on the container on each day are spread evenly across multiple partitions. 此方法可以提高并行度和总体吞吐量。This method results in better parallelism and overall higher throughput.

使用具有预先计算的后缀的分区键Use a partition key with pre-calculated suffixes

尽管随机后缀策略能够大幅提高写入吞吐量,但却难以读取特定的项。The random suffix strategy can greatly improve write throughput, but it's difficult to read a specific item. 你不知道写入该项时使用的后缀值。You don't know the suffix value that was used when you wrote the item. 若要更轻松地读取单个项,请使用预先计算的后缀策略。To make it easier to read individual items, use the pre-calculated suffixes strategy. 此策略不是使用随机数在分区之间分配项,而是使用根据想要查询的内容计算的某个数字。Instead of using a random number to distribute the items among the partitions, use a number that is calculated based on something that you want to query.

沿用前面的示例,其中的某个容器使用日期作为分区键。Consider the previous example, where a container uses a date as the partition key. 现在假设每个项有一个我们需要访问的  Vehicle-Identification-Number (VIN) 属性。Now suppose that each item has a Vehicle-Identification-Number (VIN) attribute that we want to access. 此外,假设你经常运行查询来按 VIN 和日期查找项。Further, suppose that you often run queries to find items by the VIN, in addition to date. 在应用程序将项写入容器之前,它可以基于 VIN 计算哈希后缀,并将其追加到分区键日期。Before your application writes the item to the container, it can calculate a hash suffix based on the VIN and append it to the partition key date. 这种计算可以生成介于 1 和 400 之间的均匀分布数字。The calculation might generate a number between 1 and 400 that is evenly distributed. 此结果类似于随机后缀策略方法生成的结果。This result is similar to the results produced by the random suffix strategy method. 分区键值将是与计算结果相连接的日期。The partition key value is then the date concatenated with the calculated result.

使用此策略可在分区键值之间以及分区之间均匀分散写入。With this strategy, the writes are evenly spread across the partition key values, and across the partitions. 由于可以计算特定 Vehicle-Identification-Number 的分区键值,因此可以轻松读取特定的项和日期。You can easily read a particular item and date, because you can calculate the partition key value for a specific Vehicle-Identification-Number. 此方法的好处是可以避免创建单个热分区键(即,取所有工作负荷值的分区键)。The benefit of this method is that you can avoid creating a single hot partition key, i.e., a partition key that takes all the workload.

后续步骤Next steps

可通过以下文章详细了解分区的概念:You can learn more about the partitioning concept in the following articles: