Azure Cosmos DB 中的索引策略Indexing policies in Azure Cosmos DB

在 Azure Cosmos DB 中,每个容器都有一个确定了如何为容器项编制索引的索引策略。In Azure Cosmos DB, every container has an indexing policy that dictates how the container's items should be indexed. 新建容器的默认索引策略会对每个项的每个属性编制索引,对任何字符串或数字强制使用范围索引,对 Point 类型的任何 GeoJSON 对象强制使用空间索引。The default indexing policy for newly created containers indexes every property of every item, enforcing range indexes for any string or number, and spatial indexes for any GeoJSON object of type Point. 这样,无需提前考虑索引和索引管理,就能获得较高的查询性能。This allows you to get high query performance without having to think about indexing and index management upfront.

在某些情况下,你可能想要替代此自动行为,以便更好地满足自己的要求。In some situations, you may want to override this automatic behavior to better suit your requirements. 可以通过设置容器索引策略的索引模式来自定义该策略,并可以包含或排除属性路径。 You can customize a container's indexing policy by setting its indexing mode, and include or exclude property paths.

备注

本文所述的更新索引策略的方法仅适用于 Azure Cosmos DB 的 SQL (Core) API。The method of updating indexing policies described in this article only applies to Azure Cosmos DB's SQL (Core) API.

索引模式Indexing mode

Azure Cosmos DB 支持两种索引模式:Azure Cosmos DB supports two indexing modes:

  • 一致:创建、更新或删除项时,索引将以同步方式更新。Consistent: The index is updated synchronously as you create, update or delete items. 这意味着,读取查询的一致性是为帐户配置的一致性This means that the consistency of your read queries will be the consistency configured for the account.
  • :针对该容器禁用索引。None: Indexing is disabled on the container. 将容器用作单纯的键-值存储时,通常会使用此设置,在此情况下无需使用辅助索引。This is commonly used when a container is used as a pure key-value store without the need for secondary indexes. 它还可用于改善批量操作的性能。It can also be used to improve the performance of bulk operations. 批量操作完成后,可将索引模式设置为“一致”,然后使用 IndexTransformationProgress 进行监视,直到完成。After the bulk operations are complete, the index mode can be set to Consistent and then monitored using the IndexTransformationProgress until complete.

备注

Azure Cosmos DB 还支持延迟索引模式。Azure Cosmos DB also supports a Lazy indexing mode. 当引擎未执行任何其他工作时,延迟索引将以低得多的优先级对索引执行更新。Lazy indexing performs updates to the index at a much lower priority level when the engine is not doing any other work. 这可能导致查询结果不一致或不完整This can result in inconsistent or incomplete query results. 如果计划查询 Cosmos 容器,则不应选择“延迟索引”。If you plan to query a Cosmos container, you should not select lazy indexing. 2020 年 6 月,我们引入了一项更改,不再允许将新容器设置为“延迟索引”模式。In June 2020, we introduced a change that no longer allows new containers to be set to Lazy indexing mode. 如果 Azure Cosmos DB 帐户已经包含至少一个具有延迟索引的容器,则将自动从更改中免除此帐户。If your Azure Cosmos DB account already contains at least one container with lazy indexing, this account is automatically exempt from the change. 还可以通过联系 Azure 支持来请求免除。You can also request an exemption by contacting Azure support.

默认情况下,索引策略设置为 automaticBy default, indexing policy is set to automatic. 为此,可将索引策略中的 automatic 属性设置为 trueIt's achieved by setting the automatic property in the indexing policy to true. 将此属性设置为 true 可让 Azure CosmosDB 在写入文档时自动为文档编制索引。Setting this property to true allows Azure CosmosDB to automatically index documents as they are written.

包含和排除属性路径Including and excluding property paths

自定义索引策略可以指定要在索引编制中显式包含或排除的属性路径。A custom indexing policy can specify property paths that are explicitly included or excluded from indexing. 通过优化编制索引的路径数量,可以减少容器使用的存储量并改善写入操作的延迟。By optimizing the number of paths that are indexed, you can lower the amount of storage used by your container and improve the latency of write operations. 这些路径是遵循索引概述部分所述的方法定义的,补充要求如下:These paths are defined following the method described in the indexing overview section with the following additions:

  • 指向标量值(字符串或数字)的路径以 /? 结尾a path leading to a scalar value (string or number) ends with /?
  • 数组中的元素通过 /[] 表示法(而不是 /0/1 等)统一寻址elements from an array are addressed together through the /[] notation (instead of /0, /1 etc.)
  • 可以使用 /* 通配符来匹配节点下的任意元素the /* wildcard can be used to match any elements below the node

沿用前面的示例:Taking the same example again:

{
    "locations": [
        { "country": "Germany", "city": "Berlin" },
        { "country": "France", "city": "Paris" }
    ],
    "headquarters": { "country": "Belgium", "employees": 250 }
    "exports": [
        { "city": "Moscow" },
        { "city": "Athens" }
    ]
}
  • headquartersemployees 路径是 /headquarters/employees/?the headquarters's employees path is /headquarters/employees/?

  • locationscountry 路径是 /locations/[]/country/?the locations' country path is /locations/[]/country/?

  • headquarters 下的任何内容的路径是 /headquarters/*the path to anything under headquarters is /headquarters/*

例如,可以包含 /headquarters/employees/? 路径。For example, we could include the /headquarters/employees/? path. 此路径确保为 employees 属性编制索引,但不会为此属性中的其他嵌套 JSON 编制索引。This path would ensure that we index the employees property but would not index additional nested JSON within this property.

包含/排除策略Include/exclude strategy

任何索引策略必须包含根路径 /* 作为包含或排除的路径。Any indexing policy has to include the root path /* as either an included or an excluded path.

  • 包含根路径可以选择性地排除不需要编制索引的路径。Include the root path to selectively exclude paths that don't need to be indexed. 这是建议的方法,因为这样可以让 Azure Cosmos DB 主动为可以添加到模型的任何新属性编制索引。This is the recommended approach as it lets Azure Cosmos DB proactively index any new property that may be added to your model.

  • 排除根路径可以选择性地包含需要编制索引的路径。Exclude the root path to selectively include paths that need to be indexed.

  • 对于包含常规字符(包括字母数字字符和下划线 _)的路径,无需在双引号中转义路径字符串(例如 "/path/?")。For paths with regular characters that include: alphanumeric characters and _ (underscore), you don't have to escape the path string around double quotes (for example, "/path/?"). 对于包含其他特殊字符的路径,需要在双引号中转义路径字符串(例如 "/"path-abc"/?")。For paths with other special characters, you need to escape the path string around double quotes (for example, "/"path-abc"/?"). 如果预期路径中会出现特殊字符,出于安全考虑,可以转义每个路径。If you expect special characters in your path, you can escape every path for safety. 在功能上,转义每个路径与仅转义包含特殊字符的路径没有任何差别。Functionally it doesn't make any difference if you escape every path Vs just the ones that have special characters.

  • 默认情况下,系统属性 _etag 被排除在索引之外,除非将 etag 添加到索引所包含的路径中。The system property _etag is excluded from indexing by default, unless the etag is added to the included path for indexing.

  • 如果将索引模式设为“一致”,则会自动为系统属性 id_ts 编制索引。If the indexing mode is set to consistent, the system properties id and _ts are automatically indexed.

包含和排除路径时,可能会遇到以下属性:When including and excluding paths, you may encounter the following attributes:

  • kind 可以是 rangehashkind can be either range or hash. 范围索引功能提供哈希索引的所有功能,因此我们建议使用范围索引。Range index functionality provides all of the functionality of a hash index, so we recommend using a range index.

  • precision 在包含的路径的索引级别定义的一个数字。precision is a number defined at the index level for included paths. -1 值表示最大精度。A value of -1 indicates maximum precision. 我们建议始终将此值设置为 -1We recommend always setting this value to -1.

  • dataType 可以是 StringNumberdataType can be either String or Number. 这表示要编制索引的 JSON 属性的类型。This indicates the types of JSON properties which will be indexed.

如果未指定,这些属性将使用以下默认值:When not specified, these properties will have the following default values:

属性名称Property Name 默认值Default Value
kind range
precision -1
dataType StringNumberString and Number

有关包含和排除路径的索引策略示例,请参阅此部分See this section for indexing policy examples for including and excluding paths.

包含/排除优先级Include/exclude precedence

如果包含路径和排除路径有冲突,则以更精确的路径优先。If your included paths and excluded paths have a conflict, the more precise path takes precedence.

下面是一个示例:Here's an example:

包含的路径/food/ingredients/nutrition/*Included Path: /food/ingredients/nutrition/*

排除的路径/food/ingredients/*Excluded Path: /food/ingredients/*

在这种情况下,包含路径优先于排除路径,因为它更精确。In this case, the included path takes precedence over the excluded path because it is more precise. 根据这些路径,位于或嵌套在 food/ingredients 路径中的任何数据都将从索引中排除。Based on these paths, any data in the food/ingredients path or nested within would be excluded from the index. 异常是包含的路径 /food/ingredients/nutrition/* 中的数据,该路径将被索引。The exception would be data within the included path: /food/ingredients/nutrition/*, which would be indexed.

下面是有关 Azure Cosmos DB 中包含和排除路径优先级的一些规则:Here are some rules for included and excluded paths precedence in Azure Cosmos DB:

  • 较深的路径比较窄的路径更精确。Deeper paths are more precise than narrower paths. 例如:/a/b/?/a/? 更精确。for example: /a/b/? is more precise than /a/?.

  • /?/* 更精确。The /? is more precise than /*. 例如,/a/?/a/* 更精确,因此 /a/? 优先。For example /a/? is more precise than /a/* so /a/? takes precedence.

  • 路径 /* 必须是包含路径或排除路径。The path /* must be either an included path or excluded path.

空间索引Spatial indexes

在索引策略中定义空间路径时,应定义要将哪个索引 type 应用到该路径。When you define a spatial path in the indexing policy, you should define which index type should be applied to that path. 空间索引的可能类型包括:Possible types for spatial indexes include:

  • Point

  • PolygonPolygon

  • MultiPolygonMultiPolygon

  • LineStringLineString

Azure Cosmos DB 默认不会创建任何空间索引。Azure Cosmos DB, by default, will not create any spatial indexes. 若要使用空间 SQL 内置函数,应该对所需的属性创建空间索引。If you would like to use spatial SQL built-in functions, you should create a spatial index on the required properties. 有关添加空间索引的索引策略示例,请参阅此部分See this section for indexing policy examples for adding spatial indexes.

组合索引Composite indexes

包含 ORDER BY 子句(该子句包含两个或更多个属性)的查询需要一个组合索引。Queries that have an ORDER BY clause with two or more properties require a composite index. 还可以定义一个组合索引来改善许多相等性和范围查询的性能。You can also define a composite index to improve the performance of many equality and range queries. 默认情况下不会定义组合索引,因此,应根据需要添加组合索引By default, no composite indexes are defined so you should add composite indexes as needed.

与指定包含或排除路径不同,不能创建包含 /* 通配符的路径。Unlike with included or excluded paths, you can't create a path with the /* wildcard. 每个复合路径的末尾都有一个不需要指定的隐式 /?Every composite path has an implicit /? at the end of the path that you don't need to specify. 复合路径会导致一个标量值,这是复合索引中包含的唯一值。Composite paths lead to a scalar value and this is the only value that is included in the composite index.

定义组合索引时,请指定:When defining a composite index, you specify:

  • 两个或更多个属性路径。Two or more property paths. 属性路径的定义顺序非常重要。The sequence in which property paths are defined matters.

  • 顺序(升序或降序)。The order (ascending or descending).

备注

添加组合索引时,该查询将利用现有范围索引,直到新的组合索引添加已完成。When you add a composite index, the query will utilize existing range indexes until the new composite index addition is complete. 因此,在添加组合索引时,可能不会立即观察到性能改进。Therefore, when you add a composite index, you may not immediately observe performance improvements. 可以使用某个 SDK 跟踪索引转换的进度。It is possible to track the progress of index transformation by using one of the SDKs.

针对多个属性的 ORDER BY 查询:ORDER BY queries on multiple properties:

对包含 ORDER BY 子句(该子句包含两个或更多个属性)的查询使用组合索引时,请注意以下事项:The following considerations are used when using composite indexes for queries with an ORDER BY clause with two or more properties:

  • 如果组合索引路径与 ORDER BY 子句中的属性顺序不匹配,则组合索引无法支持查询。If the composite index paths do not match the sequence of the properties in the ORDER BY clause, then the composite index can't support the query.

  • 组合索引路径的顺序(升序或降序)还应与 ORDER BY 子句中的 order 相匹配。The order of composite index paths (ascending or descending) should also match the order in the ORDER BY clause.

  • 组合索引还支持在所有路径中使用反向顺序的 ORDER BY 子句。The composite index also supports an ORDER BY clause with the opposite order on all paths.

考虑以下示例,其中针对属性 name、age 和 _ts 定义了组合索引:Consider the following example where a composite index is defined on properties name, age, and _ts:

组合索引Composite Index 示例 ORDER BY 查询Sample ORDER BY Query 是否受组合索引的支持?Supported by Composite Index?
(name ASC, age ASC) SELECT * FROM c ORDER BY c.name ASC, c.age asc Yes
(name ASC, age ASC) SELECT * FROM c ORDER BY c.age ASC, c.name asc No
(name ASC, age ASC) SELECT * FROM c ORDER BY c.name DESC, c.age DESC Yes
(name ASC, age ASC) SELECT * FROM c ORDER BY c.name ASC, c.age DESC No
(name ASC, age ASC, timestamp ASC) SELECT * FROM c ORDER BY c.name ASC, c.age ASC, timestamp ASC Yes
(name ASC, age ASC, timestamp ASC) SELECT * FROM c ORDER BY c.name ASC, c.age ASC No

应该自定义索引策略,以便可为所有必要的 ORDER BY 查询提供服务。You should customize your indexing policy so you can serve all necessary ORDER BY queries.

包含针对多个属性的筛选器的查询Queries with filters on multiple properties

如果查询包含针对两个或更多个属性的筛选器,为这些属性创建组合索引可能会有帮助。If a query has filters on two or more properties, it may be helpful to create a composite index for these properties.

例如,考虑以下查询,其中包含针对两个属性的相等性筛选器:For example, consider the following query which has an equality filter on two properties:

SELECT * FROM c WHERE c.name = "John" AND c.age = 18

如果能够针对 (name ASC, age ASC) 利用组合索引,则此查询将更加高效:花费的时间更少,且消耗的 RU 更少。This query will be more efficient, taking less time and consuming fewer RU's, if it is able to leverage a composite index on (name ASC, age ASC).

还可以使用组合索引来优化包含范围筛选器的查询。Queries with range filters can also be optimized with a composite index. 但是,查询只能包含一个范围筛选器。However, the query can only have a single range filter. 范围筛选器包括 ><<=>=!=Range filters include >, <, <=, >=, and !=. 范围筛选器应在组合索引中最后定义。The range filter should be defined last in the composite index.

考虑以下查询,其中同时包含相等性筛选器和范围筛选器:Consider the following query with both equality and range filters:

SELECT * FROM c WHERE c.name = "John" AND c.age > 18

针对 (name ASC, age ASC) 使用组合索引可以更有效地运行此查询。This query will be more efficient with a composite index on (name ASC, age ASC). 但是,该查询不会针对 (age ASC, name ASC) 利用组合索引,因为相等性筛选器必须在组合索引中首先定义。However, the query would not utilize a composite index on (age ASC, name ASC) because the equality filters must be defined first in the composite index.

为包含针对多个属性的筛选器的查询创建组合索引时,请注意以下事项The following considerations are used when creating composite indexes for queries with filters on multiple properties

  • 查询筛选器中的属性应与组合索引中的属性相匹配。The properties in the query's filter should match those in composite index. 如果某个属性在组合索引中,但未作为筛选器包含在查询中,则查询不会利用该组合索引。If a property is in the composite index but is not included in the query as a filter, the query will not utilize the composite index.
  • 如果查询包含筛选器中的其他属性,但这些属性未在组合索引中定义,则会结合使用组合索引和范围索引来评估查询。If a query has additional properties in the filter that were not defined in a composite index, then a combination of composite and range indexes will be used to evaluate the query. 这样,所需的 RU 数就比专门使用范围索引更少。This will require fewer RU's than exclusively using range indexes.
  • 如果某个属性包含范围筛选器(><<=>=!=),则此属性应在组合索引中最后定义。If a property has a range filter (>, <, <=, >=, or !=), then this property should be defined last in the composite index. 如果某个查询包含多个范围筛选器,则该查询不会利用组合索引。If a query has more than one range filter, it will not utilize the composite index.
  • 创建组合索引来优化包含多个筛选器的查询时,组合索引的 ORDER 不会对结果造成任何影响。When creating a composite index to optimize queries with multiple filters, the ORDER of the composite index will have no impact on the results. 此属性是可选的。This property is optional.
  • 如果没有为包含针对多个属性的筛选器的查询定义组合索引,该查询仍会成功。If you do not define a composite index for a query with filters on multiple properties, the query will still succeed. 但是,使用组合索引可以减少查询的 RU 开销。However, the RU cost of the query can be reduced with a composite index.

考虑以下示例,其中针对属性 name、age 和 timestamp 定义了组合索引:Consider the following examples where a composite index is defined on properties name, age, and timestamp:

组合索引Composite Index 示例查询Sample Query 是否受组合索引的支持?Supported by Composite Index?
(name ASC, age ASC) SELECT * FROM c WHERE c.name = "John" AND c.age = 18 Yes
(name ASC, age ASC) SELECT * FROM c WHERE c.name = "John" AND c.age > 18 Yes
(name DESC, age ASC) SELECT * FROM c WHERE c.name = "John" AND c.age > 18 Yes
(name ASC, age ASC) SELECT * FROM c WHERE c.name != "John" AND c.age > 18 No
(name ASC, age ASC, timestamp ASC) SELECT * FROM c WHERE c.name = "John" AND c.age = 18 AND c.timestamp > 123049923 Yes
(name ASC, age ASC, timestamp ASC) SELECT * FROM c WHERE c.name = "John" AND c.age < 18 AND c.timestamp = 123049923 No

包含筛选器和 ORDER BY 子句的查询Queries with a filter as well as an ORDER BY clause

如果查询针对一个或多个属性进行筛选,并在 ORDER BY 子句中包含不同的属性,则将筛选器中的属性添加到 ORDER BY 子句可能会有帮助。If a query filters on one or more properties and has different properties in the ORDER BY clause, it may be helpful to add the properties in the filter to the ORDER BY clause.

例如,通过将筛选器中的属性添加到 ORDER BY 子句,可以重写以下查询来利用组合索引:For example, by adding the properties in the filter to the ORDER BY clause, the following query could be rewritten to leverage a composite index:

使用范围索引的查询:Query using range index:

SELECT * FROM c WHERE c.name = "John" ORDER BY c.timestamp

使用组合索引的查询:Query using composite index:

SELECT * FROM c WHERE c.name = "John" ORDER BY c.name, c.timestamp

对于包含多个相等性筛选器的查询,可以通用化相同的模式和查询优化:The same pattern and query optimizations can be generalized for queries with multiple equality filters:

使用范围索引的查询:Query using range index:

SELECT * FROM c WHERE c.name = "John", c.age = 18 ORDER BY c.timestamp

使用组合索引的查询:Query using composite index:

SELECT * FROM c WHERE c.name = "John", c.age = 18 ORDER BY c.name, c.age, c.timestamp

创建组合索引来优化包含筛选器和 ORDER BY 子句的查询时,请注意以下事项:The following considerations are used when creating composite indexes to optimize a query with a filter and ORDER BY clause:

  • 如果查询针对属性进行筛选,应该首先将这些属性包含在 ORDER BY 子句中。If the query filters on properties, these should be included first in the ORDER BY clause.
  • 对于包含针对一个属性的筛选器并包含一个使用不同属性的独立 ORDER BY 子句的查询,如果未为它定义组合索引,该查询仍会成功。If you do not define a composite index on a query with a filter on one property and a separate ORDER BY clause using a different property, the query will still succeed. 但是,使用组合索引可以减少查询的 RU 开销,尤其是 ORDER BY 子句中的属性具有较高的基数时。However, the RU cost of the query can be reduced with a composite index, particularly if the property in the ORDER BY clause has a high cardinality.
  • 有关为包含多个属性的 ORDER BY 查询,以及为包含针对多个属性的筛选器的查询创建组合查询的所有注意事项仍然适用。All considerations for creating composite indexes for ORDER BY queries with multiple properties as well as queries with filters on multiple properties still apply.
组合索引Composite Index 示例 ORDER BY 查询Sample ORDER BY Query 是否受组合索引的支持?Supported by Composite Index?
(name ASC, timestamp ASC) SELECT * FROM c WHERE c.name = "John" ORDER BY c.name ASC, c.timestamp ASC Yes
(name ASC, timestamp ASC) SELECT * FROM c WHERE c.name = "John" ORDER BY c.timestamp ASC, c.name ASC No
(name ASC, timestamp ASC) SELECT * FROM c WHERE c.name = "John" ORDER BY c.timestamp ASC No
(age ASC, name ASC, timestamp ASC) SELECT * FROM c WHERE c.age = 18 and c.name = "John" ORDER BY c.age ASC, c.name ASC,c.timestamp ASC Yes
(age ASC, name ASC, timestamp ASC) SELECT * FROM c WHERE c.age = 18 and c.name = "John" ORDER BY c.timestamp ASC No

修改索引策略Modifying the indexing policy

随时可以使用 Azure 门户或某个支持的 SDK 更新容器的索引策略。A container's indexing policy can be updated at any time by using the Azure portal or one of the supported SDKs. 更新索引策略会触发从旧索引到新索引的转换,该操作是在线和就地执行的(因此,在执行该操作期间不会消耗更多的存储空间)。An update to the indexing policy triggers a transformation from the old index to the new one, which is performed online and in place (so no additional storage space is consumed during the operation). 旧策略的索引将有效转换为新策略,而不会影响写入可用性或针对容器预配的吞吐量。The old policy's index is efficiently transformed to the new policy without affecting the write availability or the throughput provisioned on the container. 索引转换是一个异步操作,完成该操作所需的时间取决于预配的吞吐量、项的数目及其大小。Index transformation is an asynchronous operation, and the time it takes to complete depends on the provisioned throughput, the number of items and their size.

备注

当添加范围索引或空间索引时,查询可能不会返回所有匹配的结果,并且在返回结果时不会返回任何错误。While adding a range or spatial index, queries may not return all the matching results, and will do so without returning any errors. 这意味着,在索引转换完成之前,查询结果可能不一致。This means that query results may not be consistent until the index transformation is completed. 可以使用某个 SDK 跟踪索引转换的进度。It is possible to track the progress of index transformation by using one of the SDKs.

如果新索引策略的模式设置为“一致”,当索引转换正在进行时,无法应用其他任何索引策略更改。If the new indexing policy's mode is set to Consistent, no other indexing policy change can be applied while the index transformation is in progress. 可以通过将索引策略的模式设置为“无”(立即删除索引),来取消正在运行的索引转换。A running index transformation can be canceled by setting the indexing policy's mode to None (which will immediately drop the index).

索引策略和 TTLIndexing policies and TTL

生存时间 (TTL) 功能要求索引编制在启用它的容器中处于活动状态。The Time-to-Live (TTL) feature requires indexing to be active on the container it is turned on. 这意味着:This means that:

  • 无法在索引模式设置为“无”的容器中激活 TTL。it is not possible to activate TTL on a container where the indexing mode is set to None,
  • 无法在已激活 TTL 的容器中将索引模式设置为“无”。it is not possible to set the indexing mode to None on a container where TTL is activated.

对于不需要为任何属性路径编制索引,但需要激活 TTL 的情况,可以结合以下设置使用索引策略:For scenarios where no property path needs to be indexed, but TTL is required, you can use an indexing policy with:

  • 将索引模式设置为“一致”,并且an indexing mode set to Consistent, and
  • 不使用包含的路径,并且no included path, and
  • /* 用作唯一排除的路径。/* as the only excluded path.

后续步骤Next steps

阅读以下文章中有关索引的详细信息:Read more about the indexing in the following articles: