如何在 Azure 认知搜索技能组中引用注释How to reference annotations in an Azure Cognitive Search skillset

本文介绍如何在技能定义中引用注释,并使用示例演示各种方案。In this article, you learn how to reference annotations in skill definitions, using examples to illustrate various scenarios. 当文档的内容流经一组技能时,它将通过注释进行扩充。As the content of a document flows through a set of skills, it gets enriched with annotations. 注释可以用作进一步下游扩充的输入,也可以映射到索引中的输出字段。Annotations can be used as inputs for further downstream enrichment, or mapped to an output field in an index.

本文中的示例基于 content 字段,该字段由 Azure Blob 索引器在文档破解阶段中自动生成。Examples in this article are based on the content field generated automatically by Azure Blob indexers as part of the document cracking phase. 从 Blob 容器引用文档时,使用 "/document/content" 等格式,其中 content 字段是文档的一部分。When referring to documents from a Blob container, use a format such as "/document/content", where the content field is part of the document.

背景概念Background concepts

在复习语法之前,让我们回顾一些重要的概念,以便更好地理解本文后面提供的示例。Before reviewing the syntax, let's revisit a few important concepts to better understand the examples provided later in this article.

术语Term 说明Description
扩充文档Enriched Document 扩充文档是由管道创建和使用的内部结构,用来保存与文档相关的所有注释。An enriched document is an internal structure created and used by the pipeline to hold all annotations related to a document. 可以把扩充文档看作是注释树。Think of an enriched document as a tree of annotations. 通常,从前一个注释创建的注释将成为前一个注释的子级。Generally, an annotation created from a previous annotation becomes its child.

扩充文档仅在技能集执行期间存在。Enriched documents only exist for the duration of skillset execution. 内容映射到搜索索引后,就不再需要扩充文档了。Once content is mapped to the search index, the enriched document is no longer needed. 虽然不直接与扩充文档交互,但在创建技能集时,有一个文档的心理模型是很有用的。Although you don't interact with enriched documents directly, it's useful to have a mental model of the documents when creating a skillset.

扩充上下文Enrichment Context 扩充发生的上下文,即扩充的元素。The context in which the enrichment takes place, in terms of which element is enriched. 默认情况下,扩充上下文位于 "/document" 级别,作用域为单个文档。By default, the enrichment context is at the "/document" level, scoped to individual documents. 当一个技能运行时,该技能的输出将成为定义上下文的属性When a skill runs, the outputs of that skill become properties of the defined context.

示例 1:简单注释引用Example 1: Simple annotation reference

在 Azure Blob 存储中,假设你有各种文件,其中包含你想要使用实体识别提取的人名的引用。In Azure Blob storage, suppose you have a variety of files containing references to people's names that you want to extract using entity recognition. 在下面的技能定义中,"/document/content" 是整个文档的文本表示,“people”是对标识为 persons 的实体的全名提取。In the skill definition below, "/document/content" is the textual representation of the entire document, and "people" is an extraction of full names for entities identified as persons.

因为默认上下文是 "/document",所以现在可以将人员列表引用为 "/document/people"Because the default context is "/document", the list of people can now be referenced as "/document/people". 在此特定示例中 "/document/people" 是一个注释,它现在可以映射到索引中的一个字段,或者用在同一技能集的另一个技能中。In this specific case "/document/people" is an annotation, which could now be mapped to a field in an index, or used in another skill in the same skillset.

  {
    "@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
    "categories": [ "Person"],
    "defaultLanguageCode": "en",
    "inputs": [
      {
        "name": "text",
        "source": "/document/content"
      }
    ],
    "outputs": [
      {
        "name": "persons",
        "targetName": "people"
      }
    ]
  }

示例 2:引用文档中的数组Example 2: Reference an array within a document

此示例基于前一个示例构建,演示如何在同一个文档中多次调用一个扩充步骤。This example builds on the previous one, showing you how to invoke an enrichment step multiple times over the same document. 假设前面的示例生成了一个字符串数组,其中有 10 个人名来自单个文档。Assume the previous example generated an array of strings with 10 people names from a single document. 合理的下一步可能是二次扩充,将姓氏从全名中提取出来。A reasonable next step might be a second enrichment that extracts the last name from a full name. 因为有 10 个姓名,所以你希望在这个文档中调用此步骤 10 次,每个人一次。Because there are 10 names, you want this step to be called 10 times in this document, once for each person.

若要调用正确的迭代次数,请将上下文设置为 "/document/people/*",其中星号 ("*") 表示扩充文档中作为 "/document/people" 的后代的所有节点。To invoke the right number of iterations, set the context as "/document/people/*", where the asterisk ("*") represents all the nodes in the enriched document as descendants of "/document/people". 尽管此技能只在技能数组中定义了一次,但它会对文档中的每个成员调用,直到所有成员都被处理。Although this skill is only defined once in the skills array, it is called for each member within the document until all members are processed.

  {
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "Fictitious skill that gets the last name from a full name",
    "uri": "http://names.azurewebsites.net/api/GetLastName",
    "context" : "/document/people/*",
    "defaultLanguageCode": "en",
    "inputs": [
      {
        "name": "fullname",
        "source": "/document/people/*"
      }
    ],
    "outputs": [
      {
        "name": "lastname",
        "targetName": "last"
      }
    ]
  }

当注释是字符串的数组或集合时,你可能希望针对特定成员而不是整个数组。When annotations are arrays or collections of strings, you might want to target specific members rather than the array as a whole. 上述示例在由上下文表示的每个节点下生成名为 "last" 的注释。The above example generates an annotation called "last" under each node represented by the context. 如果想要引用此系列的注释,可以使用语法 "/document/people/*/last"If you want to refer to this family of annotations, you could use the syntax "/document/people/*/last". 如果想要引用特定注释,可以使用显式索引:"/document/people/1/last 引用文档中标识的第一个人的姓氏。If you want to refer to a particular annotation, you could use an explicit index: "/document/people/1/last" to reference the last name of the first person identified in the document. 请注意,在此语法中数组的“索引从 0 开始”。Notice that in this syntax arrays are "0 indexed".

示例 3:引用数组中的成员Example 3: Reference members within an array

有时,需要对特定类型的所有注释进行分组,以将它们传递给特定技能。Sometimes you need to group all annotations of a particular type to pass them to a particular skill. 考虑一种假设的自定义技能,它标识在示例 2 中提取的所有姓氏中最常见的姓氏。Consider a hypothetical custom skill that identifies the most common last name from all the last names extracted in Example 2. 若要只将姓氏提供自定义技能,请将上下文指定为 "/document",将输入为指定为 "/document/people/*/lastname"To provide just the last names to the custom skill, specify the context as "/document" and the input as "/document/people/*/lastname".

请注意,"/document/people/*/lastname" 的基数大于文档的基数。Notice that the cardinality of "/document/people/*/lastname" is larger than that of document. 就此文档而言,可能有 10 个 lastname 节点,而只有一个 document 节点。There may be 10 lastname nodes while there is only one document node for this document. 在这种情况下,系统将自动创建包含文档中的所有元素的 "/document/people/*/lastname" 数组。In that case, the system will automatically create an array of "/document/people/*/lastname" containing all of the elements in the document.

  {
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "Fictitious skill that gets the most common string from an array of strings",
    "uri": "http://names.azurewebsites.net/api/MostCommonString",
    "context" : "/document",
    "inputs": [
      {
        "name": "strings",
        "source": "/document/people/*/lastname"
      }
    ],
    "outputs": [
      {
        "name": "mostcommon",
        "targetName": "common-lastname"
      }
    ]
  }

另请参阅See also