管理 Azure Cosmos DB 的用于 MongoDB 的 API 中的索引编制Manage indexing in Azure Cosmos DB's API for MongoDB

Azure Cosmos DB 的用于 MongoDB 的 API 利用 Azure Cosmos DB 的核心索引管理功能。Azure Cosmos DB's API for MongoDB takes advantage of the core index-management capabilities of Azure Cosmos DB. 本文重点介绍如何使用 Azure Cosmos DB 的用于 MongoDB 的 API 添加索引。This article focuses on how to add indexes using Azure Cosmos DB's API for MongoDB. 你还可以阅读与所有 API 相关的 Azure Cosmos DB 中的索引编制概述You can also read an overview of indexing in Azure Cosmos DB that's relevant across all APIs.

适用于 MongoDB 服务器版本 3.6 的索引编制功能Indexing for MongoDB server version 3.6

Azure Cosmos DB 的用于 MongoDB 服务器版本 3.6 的 API 会自动为无法删除的 _id 字段编制索引。Azure Cosmos DB's API for MongoDB server version 3.6 automatically indexes the _id field, which can't be dropped. 它会自动强制确保每个分片密钥的 _id 字段的唯一性。It automatically enforces the uniqueness of the _id field per shard key. 在 Azure Cosmos DB 的用于 MongoDB 的 API 中,分片和编制索引是不同的概念。In Azure Cosmos DB's API for MongoDB, sharding and indexing are separate concepts. 你无需为分片键编制索引。You don't have to index your shard key. 但是,与文档中的任何其他属性一样,如果此属性是查询中的常用筛选器,则我们建议为分片编制索引。However, as with any other property in your document, if this property is a common filter in your queries, we recommend to index the shard key.

若要为其他字段编制索引,请应用 MongoDB 索引管理命令。To index additional fields, you apply the MongoDB index-management commands. 与在 MongoDB 中一样,Azure Cosmos DB 的用于 MongoDB 的 API 仅自动为 _id 字段编制索引。As in MongoDB, Azure Cosmos DB's API for MongoDB automatically indexes the _id field only. 此默认索引编制策略不同于 Azure Cosmos DB SQL API,后者在默认情况下会为所有字段编制索引。This default indexing policy differs from the Azure Cosmos DB SQL API, which indexes all fields by default.

要将排序应用于查询,必须对排序操作中使用的字段创建索引。To apply a sort to a query, you must create an index on the fields used in the sort operation.

索引类型Index types

单个字段Single field

只能对任何单个字段创建索引。You can create indexes on any single field. 单字段索引的排序顺序并不重要。The sort order of the single field index does not matter. 以下命令对字段 name 创建索引:The following command creates an index on the field name:

db.coll.createIndex({name:1})

在适用的情况下,一个查询将使用多个单字段索引。One query uses multiple single field indexes where available. 对于每个容器,最多可以创建 500 个单字段索引。You can create up to 500 single field indexes per container.

复合索引(MongoDB 服务器版本 3.6)Compound indexes (MongoDB server version 3.6)

Azure Cosmos DB 的用于 MongoDB 的 API 对使用版本 3.6 Wire Protocol 的帐户支持复合索引。Azure Cosmos DB's API for MongoDB supports compound indexes for accounts that use the version 3.6 wire protocol. 一个复合索引中最多可以包含 8 个字段。You can include up to eight fields in a compound index. 与在 MongoDB 中不同,仅当查询需要一次对多个字段进行高效排序时,才应创建复合索引。Unlike in MongoDB, you should create a compound index only if your query needs to sort efficiently on multiple fields at once. 对于包含多个不需要排序的筛选器的查询,请创建多个单字段索引,而不是创建单个复合索引。For queries with multiple filters that don't need to sort, create multiple single field indexes instead of a single compound index.

以下命令对字段 nameage 创建复合索引:The following command creates a compound index on the fields name and age:

db.coll.createIndex({name:1,age:1})

可以使用复合索引来同时对多个字段进行高效排序,如以下示例中所示:You can use compound indexes to sort efficiently on multiple fields at once, as shown in the following example:

db.coll.find().sort({name:1,age:1})

还可以使用前面的复合索引,在一个查询中对所有字段按降序进行高效排序。You can also use the preceding compound index to efficiently sort on a query with the opposite sort order on all fields. 下面是一个示例:Here's an example:

db.coll.find().sort({name:-1,age:-1})

但是,复合索引中的路径顺序必须与查询完全匹配。However, the sequence of the paths in the compound index must exactly match the query. 下面是一个需要其他复合索引的查询示例:Here's an example of a query that would require an additional compound index:

db.coll.find().sort({age:1,name:1})

备注

不能基于嵌套属性或数组创建复合索引。You can't create compound indexes on nested properties or arrays.

多键索引Multikey indexes

Azure Cosmos DB 创建多键索引来为数组中存储的内容编制索引。Azure Cosmos DB creates multikey indexes to index content stored in arrays. 如果为带有数组值的字段编制索引,则 Azure Cosmos DB 会自动为数组中的每个元素编制索引。If you index a field with an array value, Azure Cosmos DB automatically indexes every element in the array.

空间索引Geospatial indexes

许多地理空间运算符可受益于地理空间索引。Many geospatial operators will benefit from geospatial indexes. Azure Cosmos DB 的 API for MongoDB 目前支持 2dsphere 索引。Currently, Azure Cosmos DB's API for MongoDB supports 2dsphere indexes. 该 API 尚不支持 2d 索引。The API does not yet support 2d indexes.

下面是对 location 字段创建地理空间索引的示例:Here's an example of creating a geospatial index on the location field:

db.coll.createIndex({ location : "2dsphere" })

文本索引Text indexes

Azure Cosmos DB 的用于 MongoDB 的 API 目前支持文本索引。Azure Cosmos DB's API for MongoDB does not currently support text indexes. 要对字符串运行文本搜索查询,应使用 Azure 认知搜索与 Azure Cosmos DB 的集成。For text search queries on strings, you should use Azure Cognitive Search integration with Azure Cosmos DB.

通配符索引Wildcard indexes

可以使用通配符索引来支持针对未知字段的查询。You can use wildcard indexes to support queries against unknown fields. 假设你有一个包含有关家庭的数据的集合。Let's imagine you have a collection that holds data about families.

以下是该集合中的示例文档的一部分:Here is part of an example document in that collection:

  "children": [
     {
         "firstName": "Henriette Thaulow",
         "grade": "5"
     }
  ]

以下是另一个示例,此示例的 children 中有一组略有不同的属性:Here's another example , this time with a slightly different set of properties in children:

  "children": [
      {
        "familyName": "Merriam",
        "givenName": "Jesse",
        "pets": [
            { "givenName": "Goofy" },
            { "givenName": "Shadow" }
      },
      {
        "familyName": "Merriam",
        "givenName": "John",
      }
  ]

在此集合中,文档可以拥有许多不同的可能属性。In this collection, documents can have many different possible properties. 如果要为 children 数组中的所有数据编制索引,则有两个选择:为每个单独的属性创建单独的索引,或为整个 children 数组创建一个通配符索引。If you wanted to index all the data in the children array, you have two options: create separate indexes for each individual property or create one wildcard index for the entire children array.

创建通配符索引Create a wildcard index

以下命令在 children 内的任何属性上创建通配符索引:The following command creates a wildcard index on any properties within children:

db.coll.createIndex({"children.$**" : 1})

与在 MongoDB 中不同,通配符索引可以在查询谓词中支持多个字段。Unlike in MongoDB, wildcard indexes can support multiple fields in query predicates. 如果使用一个通配符索引,而不是为每个属性创建单独的索引,查询性能不会有差异。There will not be a difference in query performance if you use one single wildcard index instead of creating a separate index for each property.

可以使用通配符语法创建以下索引类型:You can create the following index types using wildcard syntax:

  • 单个字段Single field
  • 地理空间Geospatial

为所有属性编制索引Indexing all properties

在所有字段上创建通配符索引的方法如下:Here's how you can create a wildcard index on all fields:

db.coll.createIndex( { "$**" : 1 } )

开始开发时,在所有字段上创建通配符索引可能会很有用。As you are starting development, it may be useful to create a wildcard index on all fields. 随着在文档中为更多属性编制索引,用于编写和更新文档的请求单位 (RU) 费用将增加。As more properties are indexed in a document, the Request Unit (RU) charge for writing and updating the document will increase. 因此,如果有写入密集型工作负荷,则应选择单独的索引路径,而不要使用通配符索引。Therefore, if you have a write-heavy workload, you should opt to individually index paths as opposed to using wildcard indexes.

限制Limitations

通配符索引不支持以下任何索引类型或属性:Wildcard indexes do not support any of the following index types or properties:

  • 复合Compound
  • TTLTTL
  • 唯一Unique

与在 MongoDB 中不同,在 Azure Cosmos DB 的用于 MongoDB 的 API 中,不能使用通配符索引进行以下操作:Unlike in MongoDB, in Azure Cosmos DB's API for MongoDB you can't use wildcard indexes for:

  • 创建包含多个特定字段的通配符索引Creating a wildcard index that includes multiple specific fields

db.coll.createIndex( { "$**" : 1 }, { "wildcardProjection " : { "children.givenName" : 1, "children.grade" : 1 } } )

  • 创建排除多个特定字段的通配符索引Creating a wildcard index that excludes multiple specific fields

db.coll.createIndex( { "$**" : 1 }, { "wildcardProjection" : { "children.givenName" : 0, "children.grade" : 0 } } )

作为替代方法,你可以创建多个通配符索引。As an alternative, you could create multiple wildcard indexes.

索引属性Index properties

以下操作是遵守 Wire Protocol 版本 3.6 的帐户和遵守更低版本的帐户常用的操作。The following operations are common for accounts serving wire protocol version 3.6 and accounts serving earlier versions. 还可以详细了解支持的索引和已编制索引的属性You can learn more about supported indexes and indexed properties.

唯一索引Unique indexes

对于编制了索引的字段,唯一索引用于确保这些字段的同一值不会存在于两个或两个以上的文档中。Unique indexes are useful for enforcing that two or more documents do not contain the same value for indexed fields.

重要

创建唯一索引的前提是集合为空(不含文档)。Unique indexes can be created only when the collection is empty (contains no documents).

以下命令对字段 student_id 创建唯一索引:The following command creates a unique index on the field student_id:

globaldb:PRIMARY> db.coll.createIndex( { "student_id" : 1 }, {unique:true} )
{
        "_t" : "CreateIndexesResponse",
        "ok" : 1,
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 4
}

对于分片的集合,必须提供分片(分区)键才能创建唯一索引。For sharded collections, you must provide the shard (partition) key to create a unique index. 换言之,在分片集合上的所有唯一索引都是复合索引,其中的一个字段是分区键。In other words, all unique indexes on a sharded collection are compound indexes where one of the fields is the partition key.

以下命令创建一个分片的集合 coll(分片键为 university),该集合具有 student_iduniversity 字段上的唯一索引:The following commands create a sharded collection coll (the shard key is university) with a unique index on the fields student_id and university:

globaldb:PRIMARY> db.runCommand({shardCollection: db.coll._fullName, key: { university: "hashed"}});
{
        "_t" : "ShardCollectionResponse",
        "ok" : 1,
        "collectionsharded" : "test.coll"
}
globaldb:PRIMARY> db.coll.createIndex( { "student_id" : 1, "university" : 1 }, {unique:true})
{
        "_t" : "CreateIndexesResponse",
        "ok" : 1,
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 3,
        "numIndexesAfter" : 4
}

在前面的示例中,省略 "university":1 子句将返回包含以下消息的错误:In the preceding example, omitting the "university":1 clause returns an error with the following message:

"cannot create unique index over {student_id : 1.0} with shard key pattern { university : 1.0 }"

TTL 索引TTL indexes

若要在特定集合中启用文档过期,需创建生存时间 (TTL) 索引To enable document expiration in a particular collection, you need to create a time-to-live (TTL) index. TTL 索引是具有 expireAfterSeconds 值的 _ts 字段上的索引。A TTL index is an index on the _ts field with an expireAfterSeconds value.

示例:Example:

globaldb:PRIMARY> db.coll.createIndex({"_ts":1}, {expireAfterSeconds: 10})

上面的命令会在 db.coll 集合中删除过去 10 秒内未修改的任何文档。The preceding command deletes any documents in the db.coll collection that have not been modified in the last 10 seconds.

备注

_ts 字段是特定于 Azure Cosmos DB 的字段,不可从 MongoDB 客户端访问。The _ts field is specific to Azure Cosmos DB and is not accessible from MongoDB clients. 它是一个保留(系统)属性,其中包含文档上次进行修改时的时间戳。It is a reserved (system) property that contains the time stamp of the document's last modification.

跟踪索引进度Track index progress

Azure Cosmos DB 的用于 MongoDB 的 API 版本 3.6 支持使用 currentOp() 命令来跟踪数据库实例上的索引进度。Version 3.6 of Azure Cosmos DB's API for MongoDB supports the currentOp() command to track index progress on a database instance. 此命令返回一个文档,其中包含有关数据库实例上正在进行的操作的信息。This command returns a document that contains information about in-progress operations on a database instance. 可使用 currentOp 命令跟踪本机 MongoDB 中所有正在进行的操作。You use the currentOp command to track all in-progress operations in native MongoDB. 在 Azure Cosmos DB 的用于 MongoDB 的 API 中,此命令仅支持跟踪索引操作。In Azure Cosmos DB's API for MongoDB, this command only supports tracking the index operation.

下面这些示例演示如何使用 currentOp 命令来跟踪索引进度:Here are some examples that show how to use the currentOp command to track index progress:

  • 获取集合的索引进度:Get the index progress for a collection:

    db.currentOp({"command.createIndexes": <collectionName>, "command.$db": <databaseName>})
    
  • 获取数据库中所有集合的索引进度:Get the index progress for all collections in a database:

    db.currentOp({"command.$db": <databaseName>})
    
  • 获取 Azure Cosmos 帐户中所有数据库和集合的索引进度:Get the index progress for all databases and collections in an Azure Cosmos account:

    db.currentOp({"command.createIndexes": { $exists : true } })
    

索引进度输出示例Examples of index progress output

索引进度详细信息显示当前索引操作的进度百分比。The index progress details show the percentage of progress for the current index operation. 以下示例显示索引进度的不同阶段的输出文档格式:Here's an example that shows the output document format for different stages of index progress:

  • 如果针对“foo”集合与“bar”数据库执行了索引操作,且该操作已完成 60%,那么该操作将有以下输出文档。An index operation on a "foo" collection and "bar" database that is 60 percent complete will have the following output document. Inprog[0].progress.total 字段将 100 显示为目标完成百分比。The Inprog[0].progress.total field shows 100 as the target completion percentage.

    {
        "inprog" : [
        {
                ………………...
                "command" : {
                        "createIndexes" : foo
                        "indexes" :[ ],
                        "$db" : bar
                },
                "msg" : "Index Build (background) Index Build (background): 60 %",
                "progress" : {
                        "done" : 60,
                        "total" : 100
                },
                …………..…..
        }
        ],
        "ok" : 1
    }
    
  • 如果索引操作刚刚针对“foo”集合与“bar”数据库启动,那么在进度达到可度量的级别之前,输出文档可能会一直显示 0% 进度。If an index operation has just started on a "foo" collection and "bar" database, the output document might show 0 percent progress until it reaches a measurable level.

    {
        "inprog" : [
        {
                ………………...
                "command" : {
                        "createIndexes" : foo
                        "indexes" :[ ],
                        "$db" : bar
                },
                "msg" : "Index Build (background) Index Build (background): 0 %",
                "progress" : {
                        "done" : 0,
                        "total" : 100
                },
                …………..…..
        }
        ],
       "ok" : 1
    }
    
  • 正在进行的索引操作完成后,输出文档将显示 inprog 操作为空。When the in-progress index operation finishes, the output document shows empty inprog operations.

    {
      "inprog" : [],
      "ok" : 1
    }
    

后台索引更新Background index updates

无论为 Background 索引属性指定了什么值,索引更新始终会在后台完成。Regardless of the value specified for the Background index property, index updates are always done in the background. 由于索引更新操作使用请求单位 (RU) 的优先级低于其他数据库操作,因此,索引更改不会导致写入、更新或删除操作无法正常进行。Because index updates consume Request Units (RUs) at a lower priority than other database operations, index changes won't result in any downtime for writes, updates, or deletes.

添加新索引时,对读取可用性没有影响。There is no impact to read availability when adding a new index. 索引转换完成之后,查询将只利用新索引。Queries will only utilize new indexes once the index transformation is complete. 在索引转换过程中,查询引擎将继续使用现有的索引,因此,在索引转换过程中,你将观察到,读取性能类似于在启动索引更改之前观察到的情况。During the index transformation, the query engine will continue to use existing indexes, so you'll observe similar read performance during the indexing transformation to what you had observed before initiating the indexing change. 添加新索引时,也不会有查询结果不完整或不一致的风险。When adding new indexes, there is also no risk of incomplete or inconsistent query results.

删除索引并立即运行在删除的索引上有筛选器的查询时,在索引转换完成之前,结果可能不一致且不完整。When removing indexes and immediately running queries the have filters on the dropped indexes, results might be inconsistent and incomplete until the index transformation finishes. 如果删除索引,则当查询对这些新删除的索引进行筛选时,查询引擎不能保证结果一致或完整。If you remove indexes, the query engine does not guarantee consistent or complete results when queries filter on these newly removed indexes. 大多数开发人员不会在删除索引后立即尝试查询它们,因此,在实践中,这种情况不太可能发生。Most developers do not drop indexes and then immediately try to query them so, in practice, this situation is unlikely.

备注

可以跟踪索引进度You can track index progress.

迁移带索引的集合Migrate collections with indexes

目前,仅当集合不包含文档时才能创建唯一索引。Currently, you can only create unique indexes when the collection contains no documents. 常用 MongoDB 迁移工具会尝试在导入数据后创建唯一索引。Popular MongoDB migration tools try to create the unique indexes after importing the data. 若要规避此问题,可以手动创建相应的集合和唯一索引,而不是允许迁移工具尝试。To circumvent this issue, you can manually create the corresponding collections and unique indexes instead of allowing the migration tool to try. (可以在命令行中使用 --noIndexRestore 标志来为 mongorestore 实现此行为。)(You can achieve this behavior for mongorestore by using the --noIndexRestore flag in the command line.)

适用于 MongoDB 版本 3.2 的索引编制功能Indexing for MongoDB version 3.2

对于与 MongoDB Wire Protocol 版本 3.2 兼容的 Azure Cosmos DB 帐户,可用的索引编制功能和默认值是不同的。Available indexing features and defaults are different for Azure Cosmos accounts that are compatible with version 3.2 of the MongoDB wire protocol. 可以检查帐户的版本You can check your account's version. 可以通过提出支持请求升级到版本 3.6。You can upgrade to the 3.6 version by filing a support request.

如果使用的是版本 3.2,请阅读此本部分,其中概述了版本 3.2 与版本 3.6 之间的重要差别。If you're using version 3.2, this section outlines key differences with version 3.6.

删除默认索引(版本 3.2)Dropping default indexes (version 3.2)

与 Azure Cosmos DB 的用于 MongoDB 的 API 版本 3.6 不同,版本 3.2 默认会为每个属性编制索引。Unlike the 3.6 version of Azure Cosmos DB's API for MongoDB, version 3.2 indexes every property by default. 可以使用以下命令删除集合 (coll) 的这些默认索引:You can use the following command to drop these default indexes for a collection (coll):

> db.coll.dropIndexes()
{ "_t" : "DropIndexesResponse", "ok" : 1, "nIndexesWas" : 3 }

删除默认索引后,可以像在版本 3.6 中那样添加更多索引。After dropping the default indexes, you can add more indexes as you would in version 3.6.

复合索引(版本 3.2)Compound indexes (version 3.2)

复合索引包含对文档多个字段的引用。Compound indexes hold references to multiple fields of a document. 若要创建复合索引,请通过提出支持请求升级到版本 3.6。If you want to create a compound index, upgrade to version 3.6 by filing a support request.

通配符索引(版本 3.2)Wildcard indexes (version 3.2)

若要创建通配符索引,请通过提出支持请求升级到版本 3.6。If you want to create a wildcard index, upgrade to version 3.6 by filing a support request.

后续步骤Next steps