文本合并认知技能Text Merge cognitive skill

文本合并 技能会将字段集合中的文本合并到单个字段中。The Text Merge skill consolidates text from a collection of fields into a single field.

备注

此技能未绑定到认知服务 API,你使用它无需付费。This skill is not bound to a Cognitive Services API and you are not charged for using it. 但是,你仍然应该附加认知服务资源,以覆盖免费资源选项,该选项限制你每天进行少量的每日扩充。You should still attach a Cognitive Services resource, however, to override the Free resource option that limits you to a small number of daily enrichments per day.

@odata.type

Microsoft.Skills.Text.MergeSkillMicrosoft.Skills.Text.MergeSkill

技能参数Skill parameters

参数区分大小写。Parameters are case-sensitive.

参数名称Parameter name 说明Description
insertPreTag 每次插入之前要包含的字符串。String to be included before every insertion. 默认值为 " "The default value is " ". 要忽略空格,请将值设置为 ""To omit the space, set the value to "".
insertPostTag 每次插入后要包含的字符串。String to be included after every insertion. 默认值为 " "The default value is " ". 要忽略空格,请将值设置为 ""To omit the space, set the value to "".

示例输入Sample input

为此技能提供可用输入的 JSON 文档有:A JSON document providing usable input for this skill could be:

{
  "values": [
    {
      "recordId": "1",
      "data":
      {
        "text": "The brown fox jumps over the dog",
        "itemsToInsert": ["quick", "lazy"],
        "offsets": [3, 28]
      }
    }
  ]
}

示例输出Sample output

此示例显示之前输入的输出,假设将 insertPreTag 设置为 " "并将 insertPostTag 设置为 ""This example shows the output of the previous input, assuming that the insertPreTag is set to " ", and insertPostTag is set to "".

{
  "values": [
    {
      "recordId": "1",
      "data":
      {
        "mergedText": "The quick brown fox jumps over the lazy dog"
      }
    }
  ]
}

扩展的示例技能集定义Extended sample skillset definition

使用文本合并的一个常见场景是将图像的文本表示形式(OCR 技能中的文本或图像的描述文字)合并到文档的内容字段中。A common scenario for using Text Merge is to merge the textual representation of images (text from an OCR skill, or the caption of an image) into the content field of a document.

以下示例技能使用 OCR 技能从文档中嵌入的图像中提取文本。The following example skillset uses the OCR skill to extract text from images embedded in the document. 接下来,它会创建 merged_text 字段以包含每个图像的原始和 OCRed 文本。Next, it creates a merged_text field to contain both original and OCRed text from each image. 可在此处了解有关 OCR 技能的详细信息。You can learn more about the OCR skill here.

{
  "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text", 
          "source": "/document/content"
        },
        {
          "name": "itemsToInsert", 
          "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", 
          "source": "/document/normalized_images/*/contentOffset" 
        }
      ],
      "outputs": [
        {
          "name": "mergedText", 
          "targetName" : "merged_text"
        }
      ]
    }
  ]
}

以上示例假设存在规范化的图像字段。The example above assumes that a normalized-images field exists. 要获取规范化的图像字段,请将索引器定义中的 imageAction 配置设置为 generateNormalizedImages,如下所示:To get normalized-images field, set the imageAction configuration in your indexer definition to generateNormalizedImages as shown below:

{
  //...rest of your indexer definition goes here ...
  "parameters":{
    "configuration":{
        "dataToExtract":"contentAndMetadata",
        "imageAction":"generateNormalizedImages"
    }
  }
}

另请参阅See also