如何将 AI 扩充字段映射到可搜索索引How to map AI-enriched fields to a searchable index

索引器阶段Indexer Stages

在本文中,你将了解如何将扩充输入字段映射到可搜索索引中的输出字段。In this article, you learn how to map enriched input fields to output fields in a searchable index. 定义技能组合后,你必须将直接添加值的任何技能的输出字段映射到搜索索引中的给定字段。Once you have defined a skillset, you must map the output fields of any skill that directly contributes values to a given field in your search index.

将内容从扩充文档移动到索引中必须进行输出字段映射。Output Field Mappings are required for moving content from enriched documents into the index. 扩充的文档实际上是一个信息树,虽然索引中支持复杂类型,但有时你可能需要将扩充的树中的信息转换为更简单的类型(例如字符串数组)。The enriched document is really a tree of information, and even though there is support for complex types in the index, sometimes you may want to transform the information from the enriched tree into a more simple type (for instance, an array of strings). 使用输出字段映射,可以通过平展信息来执行数据形状转换。Output field mappings allow you to perform data shape transformations by flattening information. 即使没有定义技能组,此阶段也可能运行,但输出字段映射总是发生在技能组执行后。Output field mappings always occur after skillset execution, although it is possible for this stage to run even if no skillset is defined.

输出字段映射示例:Examples of output field mappings:

  • 作为技能组的一部分,你提取了文档的每一页中提到的组织的名称。As part of your skillset, you extracted the names of organizations mentioned in each of the pages of your document. 现在,你想要将每个组织名称映射到 Edm.Collection(Edm.String) 类型的索引中的字段。Now you want to map each of those organization names into a field in your index of type Edm.Collection(Edm.String).

  • 作为技能组的一部分,你生成了名为“document/translated_text”的新节点。As part of your skillset, you produced a new node called “document/translated_text”. 你希望将此节点上的信息映射到索引中的特定字段。You would like to map the information on this node to a specific field in your index.

  • 你没有技能组,但正在从 Cosmos DB 数据库中索引复杂类型。You don’t have a skillset but are indexing a complex type from a Cosmos DB database. 你希望获取该复杂类型上的节点,并将其映射到索引中的字段。You would like to get to a node on that complex type and map it into a field in your index.

备注

我们最近在输出字段映射上启用了映射函数的功能。We recently enabled the functionality of mapping functions on output field mappings. 有关映射函数的更多详细信息,请参阅字段映射函数For more details on mapping functions, see Field mapping functions

使用 outputFieldMappingsUse outputFieldMappings

要映射字段,请按如下所示将 outputFieldMappings 添加到索引器定义:To map fields, add outputFieldMappings to your indexer definition as shown below:

PUT https://[servicename].search.azure.cn/indexers/[indexer name]?api-version=2020-06-30
api-key: [admin key]
Content-Type: application/json

请求正文的结构如下:The body of the request is structured as follows:

{
    "name": "myIndexer",
    "dataSourceName": "myDataSource",
    "targetIndexName": "myIndex",
    "skillsetName": "myFirstSkillSet",
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_path",
            "targetFieldName": "id",
            "mappingFunction": {
                "name": "base64Encode"
            }
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/content/organizations/*/description",
            "targetFieldName": "descriptions",
            "mappingFunction": {
                "name": "base64Decode"
            }
        },
        {
            "sourceFieldName": "/document/content/organizations",
            "targetFieldName": "orgNames"
        },
        {
            "sourceFieldName": "/document/content/sentiment",
            "targetFieldName": "sentiment"
        }
    ]
}

对于每个输出字段映射,请设置数据在扩充的文档树中的位置 (sourceFieldName),以及如索引中引用的字段名称 (targetFieldName)。For each output field mapping, set the location of the data in the enriched document tree (sourceFieldName), and the name of the field as referenced in the index (targetFieldName).

平展复杂类型的信息Flattening Information from Complex Types

sourceFieldName 中的路径可以表示一个元素或多个元素。The path in a sourceFieldName can represent one element or multiple elements. 在上述示例中,/document/content/sentiment 表示单个数字值,而 /document/content/organizations/*/description 表示多个组织说明。In the example above, /document/content/sentiment represents a single numeric value, while /document/content/organizations/*/description represents several organization descriptions.

如果有多个元素,它们将“平展”成包含每个元素的数组。In cases where there are several elements, they are "flattened" into an array that contains each of the elements.

更具体地说,对于 /document/content/organizations/*/description 示例,**“描述”字段中的数据在编制索引之前将类似于说明的平面数组:More concretely, for the /document/content/organizations/*/description example, the data in the descriptions field would look like a flat array of descriptions before it gets indexed:

 ["Microsoft is a company in Seattle","LinkedIn's office is in San Francisco"]

这是一个重要原则,因此我们将提供另一个示例。This is an important principle, so we will provide another example. 假设扩充树中存在复杂类型的数组。Imagine that you have an array of complex types as part of the enrichment tree. 假设有一个名为 customEntities 的成员,它具有一个复杂类型数组,如下面所述。Let's say there is a member called customEntities that has an array of complex types like the one described below.

"document/customEntities": 
[
    {
        "name": "heart failure",
        "matches": [
            {
                "text": "heart failure",
                "offset": 10,
                "length": 12,
                "matchDistance": 0.0
            }
        ]
    },
    {
        "name": "morquio",
        "matches": [
            {
                "text": "morquio",
                "offset": 25,
                "length": 7,
                "matchDistance": 0.0
            }
        ]
    }
    //...
]

假设你的索引中有一个名为“疾病”的字段,该字段的类型为 Collection(Edm.String),你希望在该字段中存储每个实体的名称。Let's assume that your index has a field called 'diseases' of type Collection(Edm.String), where you would like to store each of the names of the entities.

这可以通过使用“*”符号来轻松实现,如下所示:This can be done easily by using the "*" symbol, as follows:

    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/customEntities/*/name",
            "targetFieldName": "diseases"
        }
    ]

此操作只是简单将各个 customEntities 元素的名称平展为单个字符串数组,如下所示:This operation will simply “flatten” each of the names of the customEntities elements into a single array of strings like this:

  "diseases" : ["heart failure","morquio"]

后续步骤Next steps

将扩充字段映射到可搜索字段后,你可以设置每个可搜索字段中的字段属性作为索引定义的一部分Once you have mapped your enriched fields to searchable fields, you can set the field attributes for each of the searchable fields as part of the index definition.

有关字段映射的详细信息,请参阅 Azure 认知搜索索引器中的字段映射For more information about field mapping, see Field mappings in Azure Cognitive Search indexers.