Azure AI 搜索中的多向量字段支持

注释

此功能目前处于公开预览状态。此预览版未随附服务级别协议，建议不要用于生产工作负载。某些功能可能不受支持或者受限。有关详细信息，请参阅适用于 Azure 预览版的补充使用条款。

借助 Azure AI 搜索中的多向量字段支持功能，可以在单个文档字段中为多个子向量编制索引。此功能对于多模式数据或长格式文档等用例非常有用，其中使用单个向量表示内容将导致重要细节丢失。

局限性

复杂字段中的嵌套区块不支持语义排名器。因此，语义排名器不支持多向量字段中的嵌套向量。

了解多向量字段支持

传统上，矢量类型（例如 Collection(Edm.Single) ，只能在顶级字段中使用）。通过引入多向量字段支持，现在可以在复杂集合的嵌套字段中使用矢量类型，从而有效地允许多个向量与单个文档相关联。

单个文档总共可以包含最多 100 个矢量，涵盖所有复杂集合字段。矢量字段只能嵌套一层。

具有多向量字段的索引定义

此功能不需要新的索引属性。下面是一个示例索引定义：

{
  "name": "multivector-index",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "key": true,
      "searchable": true
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true
    },
    {
      "name": "description",
      "type": "Edm.String",
      "searchable": true
    },
    {
      "name": "descriptionEmbedding",
      "type": "Collection(Edm.Single)",
      "dimensions": 3,
      "searchable": true,
      "retrievable": true,
      "vectorSearchProfile": "hnsw"
    },
    {
      "name": "scenes",
      "type": "Collection(Edm.ComplexType)",
      "fields": [
        {
          "name": "embedding",
          "type": "Collection(Edm.Single)",
          "dimensions": 3,
          "searchable": true,
          "retrievable": true,
          "vectorSearchProfile": "hnsw"
        },
        {
          "name": "timestamp",
          "type": "Edm.Int32",
          "retrievable": true
        },
        {
          "name": "description",
          "type": "Edm.String",
          "searchable": true,
          "retrievable": true
        },
        {
          "name": "framePath",
          "type": "Edm.String",
          "retrievable": true
        }
      ]
    }
  ]
}

示例引入文档

下面是一个示例文档，演示如何在实践中使用多向量字段：

{
  "id": "123",
  "title": "Non-Existent Movie",
  "description": "A fictional movie for demonstration purposes.",
  "descriptionEmbedding": [1, 2, 3],
  "releaseDate": "2025-08-01",
  "scenes": [
    {
      "embedding": [4, 5, 6],
      "timestamp": 120,
      "description": "A character is introduced.",
      "framePath": "nonexistentmovie\\scenes\\scene120.png"
    },
    {
      "embedding": [7, 8, 9],
      "timestamp": 2400,
      "description": "The climax of the movie.",
      "framePath": "nonexistentmovie\\scenes\\scene2400.png"
    }
  ]
}

在此示例中，场景字段是包含多个向量（嵌入字段）以及其他关联数据的复杂集合。每个向量表示电影中的场景，可用于在其他电影中查找类似的场景以及其他潜在的用例。

支持多向量字段的查询

多向量字段支持功能对 Azure AI 搜索中的查询机制进行了一些更改。但是，主要查询过程基本保持不变。以前， vectorQueries 只能将目标向量字段定义为顶级索引字段。借助此功能，我们将放宽此限制，并允许 vectorQueries 以嵌套在复杂类型集合中的字段（最多一级深度）为目标。此外，可以使用新的查询时间参数： perDocumentVectorLimit。

通过将 perDocumentVectorLimit 设置为 1，确保每个文档最多匹配一个向量，从而保证结果来自不同的文档。
perDocumentVectorLimit 设置为0（无限制）允许匹配同一文档中的多个相关向量。

{
  "vectorQueries": [
    {
      "kind": "text",
      "text": "whales swimming",
      "K": 50,
      "fields": "scenes/embedding",
      "perDocumentVectorLimit": 0
    }
  ],
  "select": "title, scenes/timestamp, scenes/framePath"
}

在单个字段中跨多个矢量排序

当多个向量与单个文档相关联时，Azure AI 搜索使用其中的最大分数进行排名。系统使用最为相关的向量对每个文档进行评分，从而防止由于较不相关向量导致的稀释。

检索集合中的相关元素

当参数中包含 $select 复杂类型的集合时，仅返回与向量查询匹配的元素。这对于检索关联的元数据（如时间戳、文本说明或图像路径）非常有用。

注释

若要减小有效负载大小，请避免在参数中包含 $select 向量值本身。如果不需要，请考虑完全省略矢量存储。

调试多向量查询（预览版）

当文档包含多个嵌入矢量（如文本和图像嵌入在不同子字段）时，系统会在所有元素中使用最高的矢量分数来对文档进行排名。

若要调试每个矢量的参与方式，请使用 innerHits 调试模式（在最新的预览版 REST API 中提供）。

POST /indexes/my-index/docs/search?api-version=2025-11-01-preview
{
  "vectorQueries": [
    {
      "kind": "vector",
      "field": "keyframes.imageEmbedding",
      "kNearestNeighborsCount": 5,
      "vector": [ /* query vector */ ]
    }
  ],
  "debug": "innerHits"
}

示例响应形状

"@search.documentDebugInfo": {
  "innerHits": {
    "keyframes": [
      {
        "ordinal": 0,
        "vectors": [
          {
            "imageEmbedding": {
              "searchScore": 0.958,
              "vectorSimilarity": 0.956
            },
            "textEmbedding": {
              "searchScore": 0.958,
              "vectorSimilarity": 0.956
            }
          }
        ]
      },
      {
        "ordinal": 1,
        "vectors": [
          {
            "imageEmbedding": null,
            "textEmbedding": {
              "searchScore": 0.872,
              "vectorSimilarity": 0.869
            }
          }
        ]
      }
    ]
  }
}

字段说明

领域	Description
`ordinal`	集合内元素的零起始索引。
`vectors`	元素中包含的每个可搜索矢量字段对应一个条目。
`searchScore`	经过任何重新评分和加分后该字段的最终评分。
`vectorSimilarity`	距离函数返回的原始相似性。

注释

innerHits 当前仅报告向量字段。

与 debug=vector 的关系

下面是有关此属性的一些事实：

现有 debug=vector 开关保持不变。
与多向量字段一起使用时， @search.documentDebugInfo.vector.subscore 显示用于对父文档进行排名的最大分数，但不显示每个元素的详细信息。
用于 innerHits 深入了解各个元素对分数的贡献。

Last updated on 2026-02-09

通过