Azure Cosmos DB for MongoDB vCore 中基于方案的索引编制指南

适用对象: MongoDB vCore

索引是提高数据检索速度的结构,可用于快速访问集合中的特定字段。 本文介绍如何在不同的嵌套级别执行索引编制,并复习如何有效地审查这些索引的利用率。

索引编制方案

我们将根据定义的示例 json 的上下文来研究示例方案。

{
  "_id": "e79b564e-48b1-4f75-990f-e62de2449239",
  "car_id":"AZ-9874532",
  "car_info": {
    "make": "Mustang",
    "model": "GT Fastback",
    "year": 2024,
    "registration": {
      "license_plate": "LJX386",
      "state": "WV",
      "registration_datetime": {
        "$date": "2024-01-10T01:16:44.000Z"
      },
      "expiration_datetime": {
        "$date": "2034-01-10T01:16:44.000Z"
      }
    }
  },
  "rental_history": [
    {
      "rental_id": "RT63857499825952",
      "customer_id": "CX8716",
      "start_date": {
        "$date": "2024-02-29T01:16:44.000Z"
      },
      "end_date": {
        "$date": "2024-03-04T16:54:44.000Z"
      },
      "pickup_location": { "type": "Point", "coordinates": [ -73.97, 40.77 ]
      },
      "drop_location": { "type": "Point", "coordinates": [ -73.96, 40.78 ]
      },
      "total_price": 232.56944444444443,
      "daily_rent": 50,
      "complains": [ 
        {
          "complain_id": "CMP638574998259520",
          "issue": "Strange odor inside the car.",
          "reported_datetime": {
            "$date": "2024-03-03T20:11:44.000Z"
          },
          "reported_medium": "Website",
          "resolutions": [
            {
              "resolution_datetime": {
                "$date": "2024-03-03T20:20:44.000Z"
              },
              "solution": "Inspect for any leftover food, spills, or trash that might be causing the odor. Contact the rental agency.",
              "resolved": true
            }
          ]
        }
      ],
      "accidents": [
        {
          "accident_id": "ACC376184",
          "date": {
            "$date": "2024-03-03T01:47:44.000Z"
          },
          "description": "Collisions with Soft Barriers: Accidents involving hitting bushes, shrubs, or other soft barriers.",
          "repair_cost": 147
        }
      ]
    },
    {
      "rental_id": "RT63857499825954",
      "customer_id": "CX1412",
      "start_date": {
        "$date": "2033-11-18T01:16:44.000Z"
      },
      "end_date": {
        "$date": "2033-11-25T21:11:44.000Z"
      },
      "pickup_location": { "type": "Point", "coordinates": [ 40, 5 ]
      },
      "drop_location": { "type": "Point", "coordinates": [ 41, 11 ]
      },
      "total_price": 305.3645833333333,
      "daily_rent": 39,
      "complains": [
        {
          "complain_id": "CMP638574998259540",
          "issue": "Unresponsive infotainment system.",
          "reported_datetime": {
            "$date": "2033-11-19T17:55:44.000Z"
          },
          "reported_medium": "Agency",
          "resolutions": []
        }
      ],
      "accidents": null
    }
  ],
  "junk": null
}

为根字段编制索引

Azure Cosmos DB for MongoDB vCore 允许对根属性编制索引。 该示例允许按照 car_id 搜索 sampleColl

CarData> db.sampleColl.createIndex({"car_id":1})

执行计划允许使用 explain 审查对 car_id 字段创建的索引的利用率。

CarData> db.sampleColl.find({"car_id":"ZA-XWB804"}).explain()

{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'sampleColl', 'filter' : { 'car_id' : 'ZA-XWB804' } }})",
  explainCommandPlanningTimeMillis: 0.156,
  explainCommandExecTimeMillis: 37.956,
  dataSize: '32 kB',
  queryPlanner: {
    namespace: 'CarData.sampleColl',
    winningPlan: {
      stage: 'FETCH',
      estimatedTotalKeysExamined: 8700,
      inputStage: {
        stage: 'IXSCAN',
        indexName: 'car_id_1',
        isBitmap: true,
        indexFilterSet: [ { '$eq': { car_id: 'ZA-XWB804' } } ],
        estimatedTotalKeysExamined: 174
      }
    }
  },
  ok: 1
}

为嵌套属性编制索引

Azure Cosmos DB for MongoDB vCore 允许为嵌入的文档属性编制索引。 该示例对嵌套文档 registration 中的字段 registration_datetime 创建索引。

CarData> db.sampleColl.createIndex({"car_info.registration.registration_datetime":1})

使用 explain 的审查执行计划可以提供索引扫描的见解。

CarData> db.sampleColl.find({"car_info.registration.registration_datetime":
                              {  $gte : new ISODate("2024-05-01")
                                ,$lt: ISODate("2024-05-07")
                              }
                            }).explain()


{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'sampleColl', 'filter' : { 'car_info.registration.registration_datetime' : { '$gte' : ISODate('2024-05-01T00:00:00Z'), '$lt' : ISODate('2024-05-07T00:00:00Z') } } }})",
  explainCommandPlanningTimeMillis: 0.095,
  explainCommandExecTimeMillis: 42.703,
  dataSize: '4087 kB',
  queryPlanner: {
    namespace: 'CarData.sampleColl',
    winningPlan: {
      stage: 'FETCH',
      estimatedTotalKeysExamined: 4350,
      inputStage: {
        stage: 'IXSCAN',
        indexName: 'car_info.registration.registration_datetime_1',
        isBitmap: true,
        indexFilterSet: [
          {
            '$range': {
              'car_info.registration.registration_datetime': {
                min: ISODate("2024-05-01T00:00:00.000Z"),
                max: ISODate("2024-05-07T00:00:00.000Z"),
                minInclusive: true,
                maxInclusive: false
              }
            }
          }
        ],
        estimatedTotalKeysExamined: 2
      }
    }
  },
  ok: 1
}

为根处的数组编制索引

Azure Cosmos DB for MongoDB vCore 允许为定义为数组的根属性编制索引。 让我们考虑以下 json 示例。

{
  "_id": ObjectId("58f56170ee9d4bd5e610d644"),
  "id": 1,
  "num": 001,
  "name": "Bulbasaur",
  "img": "http://www.serebii.net/pokemongo/pokemon/001.png",
  "type": [ 'Grass', 'Poison' ],
  "height": '0.71 m',
  "weight": '6.9 kg',
  "avg_spawns": 69,
  "spawn_time": "20:00",
  "multipliers": [ 1.58 ],
  "weaknesses": [ "Fire", "Ice", "Flying", "Psychic"],
  "next_evolution": [ { "num": "002", "name": "Ivysaur" }, { "num": "003", "name": "Venusaur" }]
}

在本示例中,我们对 weaknesses 数组字段创建索引,并检查数组中的所有三个值是否存在:GroundWaterFire

Cosmicworks> db.Pokemon.createIndex({'weaknesses':1})

Cosmicworks> db.Pokemon.find({"weaknesses":
                                {$all:["Ground","Water","Fire"]}
                              }
                            ).explain()

{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'Pokemon', 'filter' : { 'weaknesses' : { '$all' : ['Ground', 'Water', 'Fire'] } } }})",
  explainCommandPlanningTimeMillis: 10.161,
  explainCommandExecTimeMillis: 21.64,
  dataSize: '906 bytes',
  queryPlanner: {
    namespace: 'Cosmicworks.Pokemon',
    winningPlan: {
      stage: 'FETCH',
      estimatedTotalKeysExamined: 50,
      inputStage: {
        stage: 'IXSCAN',
        indexName: 'weaknesses_1',
        isBitmap: true,
        indexFilterSet: [
          { '$all': { weaknesses: [ 'Ground', 'Water', 'Fire' ] } }
        ],
        estimatedTotalKeysExamined: 2
      }
    }
  },
  ok: 1
}

注意

对于 MongoServerError:索引键太大。

请创建支持请求以启用后台索引编制,然后设置 enableLargeIndexKeys

db.runCommand({ createIndexes: "collectionName", indexes: [{ {"index_spec"}], enableLargeIndexKeys: true });

为嵌套数组编制索引

Azure Cosmos DB for MongoDB vCore 允许为嵌套数组编制索引。 该示例对 complains 数组中的 resolutions 字段创建索引。

CarData> db.sampleColl.createIndex({"rental_history.complains.resolutions":1})

我们使用 explain 审查计划,以识别所有租赁情况,但不向客户提供解决方案。

CarData> db.sampleColl.find({"rental_history.complains.resolutions":{ $exists: false, $ne: []}}).explain()

{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'sampleColl', 'filter' : { 'rental_history.complains.resolutions' : { '$exists' : false, '$ne' : [] } } }})",
  explainCommandPlanningTimeMillis: 0.12,
  explainCommandExecTimeMillis: 48.721000000000004,
  dataSize: '1747 kB',
  queryPlanner: {
    namespace: 'CarData.sampleColl',
    winningPlan: {
      stage: 'FETCH',
      estimatedTotalKeysExamined: 1933,
      inputStage: {
        stage: 'IXSCAN',
        indexName: 'rental_history.complains.resolutions_1',
        isBitmap: true,
        indexFilterSet: [
          {
            '$exists': { 'rental_history.complains.resolutions': false }
          },
          { '$ne': { 'rental_history.complains.resolutions': [] } }
        ],
        estimatedTotalKeysExamined: 2
      }
    }
  },
  ok: 1
}

为数组中的特定字段编制索引

Azure Cosmos DB for MongoDB vCore 允许为数组中的字段编制索引。 该示例对 accidents 数组中的 date 字段创建索引。

CarData> db.sampleColl.createIndex({"rental_history.accidents.date":1})

示例查询评估某个时间范围内的事件,并显示正在使用的、对 date 属性创建的索引。

CarData> db.sampleColl.find({"rental_history.accidents.date":
                                { $gte : ISODate("2024-05-01")
                                , $lt  : ISODate("2024-05-07")
                                }
                            }).explain()

{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'sampleColl', 'filter' : { 'rental_history.accidents.date' : { '$gte' : ISODate('2024-05-01T00:00:00Z'), '$lt' : ISODate('2024-05-07T00:00:00Z') } } }})",
  explainCommandPlanningTimeMillis: 19.816,
  explainCommandExecTimeMillis: 48.359,
  dataSize: '12 MB',
  queryPlanner: {
    namespace: 'CarData.sampleColl',
    winningPlan: {
      stage: 'FETCH',
      estimatedTotalKeysExamined: 4350,
      inputStage: {
        stage: 'IXSCAN',
        indexName: 'rental_history.accidents.date_1',
        isBitmap: true,
        indexFilterSet: [
          {
            '$range': {
              'rental_history.accidents.date': {
                min: ISODate("2024-05-01T00:00:00.000Z"),
                max: ISODate("2024-05-07T00:00:00.000Z"),
                minInclusive: true,
                maxInclusive: false
              }
            }
          }
        ],
        estimatedTotalKeysExamined: 2
      }
    }
  },
  ok: 1
}

注意

我们目前正在增强对嵌套数组的支持。 在某些极端情况下,特定的索引编制操作可能会导致错误。

在排除嵌套字段的同时进行通配符索引编制

Azure Cosmos DB for MongoDB vCore 支持通配符索引。 该示例允许我们排除对文档 car_info 中所有嵌套字段的索引编制。

// Excludes all the nested sub-document property 
CarData> db.sampleColl.createIndex(  {"$**":1}
                                    ,{"wildcardProjection":
                                          {  "car_info.make":0
                                            ,"car_info.model":0
                                            ,"car_info.registration":0
                                            ,"car_info.year":0
                                            ,"rental_history":0
                                          }
                                      }
                                  )

执行计划会显示不支持对 model 字段执行的查询,该字段在创建通配符索引时已被排除。

CarData> db.sampleColl.find({"car_info.model":"GT Fastback"}).explain()
{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'sampleColl', 'filter' : { 'car_info.model' : 'GT Fastback' } }})",
  explainCommandPlanningTimeMillis: 10.879,
  explainCommandExecTimeMillis: 374.25100000000003,
  dataSize: '0 bytes',
  queryPlanner: {
    namespace: 'CarData.sampleColl',
    winningPlan: {
      stage: 'COLLSCAN',
      runtimeFilterSet: [ { '$eq': { 'car_info.model': 'GT Fastback' } } ],
      estimatedTotalKeysExamined: 8700
    }
  },
  ok: 1
}

在排除嵌套对象的同时进行通配符索引编制

Azure Cosmos DB for MongoDB vCore 支持通配符索引。 该示例允许我们从文档中排除嵌套对象。

// Wildcard index excluding nested object
[mongos] CarData> db.sampleColl.createIndex( {"$**":1},
                                             {"wildcardProjection":
                                                    {  "car_info":0
                                                      ,"rental_history":0
                                                    }
                                              }
                                            )

执行计划显示不支持对 car_info 文档中嵌套字段 make 执行的查询。

CarData> db.sampleColl.find({"car_info.make":"Mustang"}).explain()
{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'sampleColl', 'filter' : { 'car_info.make' : 'Mustang' } }})",
  explainCommandPlanningTimeMillis: 21.271,
  explainCommandExecTimeMillis: 337.475,
  dataSize: '0 bytes',
  queryPlanner: {
    namespace: 'CarData.sampleColl',
    winningPlan: {
      stage: 'COLLSCAN',
      runtimeFilterSet: [ { '$eq': { 'car_info.make': 'Mustang' } } ],
      estimatedTotalKeysExamined: 8700
    }
  },
  ok: 1
}

在排除包含嵌套数组的字段的同时进行通配符索引编制

该通配符索引示例允许从嵌套数组中排除字段。 我们使用 pokemon 集合,其中突出显示了 json 格式。

{
{
  "_id": ObjectId("58f56170ee9d4bd5e610d644"),
  "id": 1,
  "num": 001,
  "name": "Bulbasaur",
  "img": "http://www.serebii.net/pokemongo/pokemon/001.png",
  "type": [ 'Grass', 'Poison' ],
  "height": '0.71 m',
  "weight": '6.9 kg',
  "avg_spawns": 69,
  "spawn_time": "20:00",
  "multipliers": [ 1.58 ],
  "weaknesses": [ "Fire", "Ice", "Flying", "Psychic"],
  "next_evolution": [ { "num": "002", "name": "Ivysaur" }, { "num": "003", "name": "Venusaur" }]
}
}

我们将针对 json 中的所有字段创建索引,同时从数组中排除 numname 字段。

Cosmicworks> db.Pokemon.createIndex( {"$**":1},
                                     {"wildcardProjection":
                                        {  "id":0
                                          ,"name":0
                                          ,"multipliers":0
                                          ,"next_evolution.num":0
                                          ,"next_evolution.name":0
                                        }
                                      }
                                    )

解释计划在查询 next_evolution 数组中的 name 字段时不会显示索引利用率。

Cosmicworks> db.Pokemon.find({"next_evolution.name":"Venusaur"}).explain()
{
  explainVersion: 2,
  command: "db.runCommand({explain: { 'find' : 'Pokemon', 'filter' : { 'next_evolution.name' : 'Venusaur' } }})",
  explainCommandPlanningTimeMillis: 0.799,
  explainCommandExecTimeMillis: 0.869,
  dataSize: '1090 bytes',
  queryPlanner: {
    namespace: 'Cosmicworks.Pokemon',
    winningPlan: {
      stage: 'COLLSCAN',
      runtimeFilterSet: [ { '$eq': { 'next_evolution.name': 'Venusaur' } } ],
      estimatedTotalKeysExamined: 76
    }
  },
  ok: 1
}

后续步骤