Azure Cosmos DB 中的索引 - 概述Indexing in Azure Cosmos DB - Overview

Azure Cosmos DB 是一种架构不可知的数据库,你可用它来迭代应用程序,而无需处理架构或索引管理。Azure Cosmos DB is a schema-agnostic database that allows you to iterate on your application without having to deal with schema or index management. 默认情况下,Azure Cosmos DB 自动对容器中所有项的每个属性编制索引,不用定义任何架构或配置辅助索引。By default, Azure Cosmos DB automatically indexes every property for all items in your container without having to define any schema or configure secondary indexes.

本文的目的是说明 Azure Cosmos DB 如何为数据编制索引以及如何使用索引来提高查询性能。The goal of this article is to explain how Azure Cosmos DB indexes data and how it uses indexes to improve query performance. 建议先阅读本部分,然后再探索如何自定义索引策略It is recommended to go through this section before exploring how to customize indexing policies.

从项到树From items to trees

每次在容器中存储项时,项的内容都投影为 JSON 文档,然后转换为树表示形式。Every time an item is stored in a container, its content is projected as a JSON document, then converted into a tree representation. 这意味着,该项的每个属性都在树中以节点的形式表示。What that means is that every property of that item gets represented as a node in a tree. 伪根节点被创建为项的所有第一级属性的父级。A pseudo root node is created as a parent to all the first-level properties of the item. 叶节点包含项带有的实际标量值。The leaf nodes contain the actual scalar values carried by an item.

例如,请看以下项:As an example, consider this item:

{
    "locations": [
        { "country": "Germany", "city": "Berlin" },
        { "country": "France", "city": "Paris" }
    ],
    "headquarters": { "country": "Belgium", "employees": 250 },
    "exports": [
        { "city": "Moscow" },
        { "city": "Athens" }
    ]
}

它由以下树表示:It would be represented by the following tree:

上一项以树的形式表示

请注意数组是如何在树中进行编码的:数组中的每个条目都获得一个中间节点,该节点标记了该数组中该条目的索引(0、1 等等)。Note how arrays are encoded in the tree: every entry in an array gets an intermediate node labeled with the index of that entry within the array (0, 1 etc.).

从树到属性路径From trees to property paths

Azure Cosmos DB 将项转换为树的原因是,它允许通过这些树中属性的路径来引用属性。The reason why Azure Cosmos DB transforms items into trees is because it allows properties to be referenced by their paths within those trees. 若要获取属性的路径,可从根节点到该属性来遍历树,并将每个遍历的节点的标签连接起来。To get the path for a property, we can traverse the tree from the root node to that property, and concatenate the labels of each traversed node.

下面是上述示例项中每个属性的路径:Here are the paths for each property from the example item described above:

  • /locations/0/country:"Germany"/locations/0/country: "Germany"
  • /locations/0/city:"Berlin"/locations/0/city: "Berlin"
  • /locations/1/country:"France"/locations/1/country: "France"
  • /locations/1/city:"Paris"/locations/1/city: "Paris"
  • /headquarters/country:"Belgium"/headquarters/country: "Belgium"
  • /headquarters/employees:250/headquarters/employees: 250
  • /exports/0/city:"Moscow"/exports/0/city: "Moscow"
  • /exports/1/city:"Athens"/exports/1/city: "Athens"

写入项时,Azure Cosmos DB 会有效地对每个属性的路径及其相应的值编制索引。When an item is written, Azure Cosmos DB effectively indexes each property's path and its corresponding value.

索引类型Index kinds

Azure Cosmos DB 目前支持三种类型的索引。Azure Cosmos DB currently supports three kinds of indexes.

范围索引Range Index

范围索引基于已排序的树形结构。Range index is based on an ordered tree-like structure. 范围索引类型用于:The range index kind is used for:

  • 相等查询:Equality queries:

    SELECT * FROM container c WHERE c.property = 'value'
    
    SELECT * FROM c WHERE c.property IN ("value1", "value2", "value3")
    

    数组元素上的相等匹配Equality match on an array element

    SELECT * FROM c WHERE ARRAY_CONTAINS(c.tags, "tag1")
    
  • 范围查询:Range queries:

    SELECT * FROM container c WHERE c.property > 'value'
    

    (适用于 ><>=<=!=(works for >, <, >=, <=, !=)

  • 检查属性是否存在:Checking for the presence of a property:

    SELECT * FROM c WHERE IS_DEFINED(c.property)
    
  • 字符串系统函数:String system functions:

    SELECT * FROM c WHERE CONTAINS(c.property, "value")
    
    SELECT * FROM c WHERE STRINGEQUALS(c.property, "value")
    
  • ORDER BY 查询:ORDER BY queries:

    SELECT * FROM container c ORDER BY c.property
    
  • JOIN 查询:JOIN queries:

    SELECT child FROM container c JOIN child IN c.properties WHERE child = 'value'
    

范围索引可用于标量值(字符串或数字)。Range indexes can be used on scalar values (string or number).

空间索引Spatial index

空间索引可对地理空间对象(例如点、线、多边形和多面)进行有效查询。Spatial indices enable efficient queries on geospatial objects such as - points, lines, polygons, and multipolygon. 这些查询使用ST_DISTANCE、ST_WITHIN 和 ST_INTERSECTS 关键字。These queries use ST_DISTANCE, ST_WITHIN, ST_INTERSECTS keywords. 下面是使用空间索引类型的一些示例:The following are some examples that use spatial index kind:

  • 地理空间距离查询:Geospatial distance queries:

    SELECT * FROM container c WHERE ST_DISTANCE(c.property, { "type": "Point", "coordinates": [0.0, 10.0] }) < 40
    
  • 在查询的地理空间:Geospatial within queries:

    SELECT * FROM container c WHERE ST_WITHIN(c.property, {"type": "Point", "coordinates": [0.0, 10.0] })
    
  • 地理空间相交查询:Geospatial intersect queries:

    SELECT * FROM c WHERE ST_INTERSECTS(c.property, { 'type':'Polygon', 'coordinates': [[ [31.8, -5], [32, -5], [31.8, -5] ]]  })  
    

空间索引可在格式正确的 GeoJSON 对象上使用。Spatial indexes can be used on correctly formatted GeoJSON objects. 目前支持点、线串、多边形和多面。Points, LineStrings, Polygons, and MultiPolygons are currently supported.

组合索引Composite indexes

对多个字段执行操作时,组合索引可提高效率。Composite indices increase the efficiency when you are performing operations on multiple fields. 组合索引类型用于:The composite index kind is used for:

  • 对多个属性的 ORDER BY 查询:ORDER BY queries on multiple properties:

     SELECT * FROM container c ORDER BY c.property1, c.property2
    
  • 使用筛选器和 ORDER BY 的查询。Queries with a filter and ORDER BY. 如果在 ORDER BY 子句中添加筛选器属性,则这些查询可使用组合索引。These queries can utilize a composite index if the filter property is added to the ORDER BY clause.

     SELECT * FROM container c WHERE c.property1 = 'value' ORDER BY c.property1, c.property2
    
  • 对两个或更多属性进行的带筛选器的查询,其中至少一个属性是等式筛选器Queries with a filter on two or more properties where at least one property is an equality filter

     SELECT * FROM container c WHERE c.property1 = 'value' AND c.property2 > 'value'
    

只要有一个筛选器谓词使用某一索引类型,查询引擎就将在扫描其余部分之前先评估该谓词。As long as one filter predicate uses one of the index kind, the query engine will evaluate that first before scanning the rest. 例如,如果你有一个 SQL 查询,如 SELECT * FROM c WHERE c.firstName = "Andrew" and CONTAINS(c.lastName, "Liu")For example, if you have a SQL query such as SELECT * FROM c WHERE c.firstName = "Andrew" and CONTAINS(c.lastName, "Liu")

  • 上面的查询将先使用索引筛选 firstName = "Andrew" 的条目,The above query will first filter for entries where firstName = "Andrew" by using the index. 然后通过后续管道传递所有 firstName = "Andrew" 的条目来评估 CONTAINS 筛选器谓词。It then pass all of the firstName = "Andrew" entries through a subsequent pipeline to evaluate the CONTAINS filter predicate.

  • 如果使用不采用索引(如 CONTAINS)的函数,可额外添加使用该索引的筛选器谓词来加快查询速度和避免完整容器扫描。You can speed up queries and avoid full container scans when using functions that don't use the index (e.g. CONTAINS) by adding additional filter predicates that do use the index. 筛选子句的顺序并不重要。The order of filter clauses isn't important. 查询引擎将确定哪些谓词更具选择性,并相应地运行查询。The query engine is will figure out which predicates are more selective and run the query accordingly.

使用索引进行查询Querying with indexes

通过编制数据索引时提取的路径,可在处理查询时轻松查找索引。The paths extracted when indexing data make it easy to lookup the index when processing a query. 通过将查询的 WHERE 子句与已编制索引的路径的列表相匹配,可快速确定与查询谓词匹配的项。By matching the WHERE clause of a query with the list of indexed paths, it is possible to identify the items that match the query predicate very quickly.

例如,请看以下查询:SELECT location FROM location IN company.locations WHERE location.country = 'France'For example, consider the following query: SELECT location FROM location IN company.locations WHERE location.country = 'France'. 查询谓词(对项进行筛选,其中任何位置都采用“法国”作为其国家/地区)与下面用红色突出显示的路径相匹配:The query predicate (filtering on items, where any location has "France" as its country/region) would match the path highlighted in red below:

上一项以树的形式表示

备注

按单个属性排序的 ORDER BY 子句总是需要一个范围索引,如果它引用的路径没有范围索引,则会失败。An ORDER BY clause that orders by a single property always needs a range index and will fail if the path it references doesn't have one. 同样地,按多个属性排序的 ORDER BY 查询总是需要一个组合索引。Similarly, an ORDER BY query which orders by multiple properties always needs a composite index.

后续步骤Next steps

阅读以下文章中有关索引的详细信息:Read more about indexing in the following articles: