关键短语提取认知技能Key Phrase Extraction cognitive skill

关键短语提取技能可以计算非结构化的文本,并针对每个记录返回关键短语列表。The Key Phrase Extraction skill evaluates unstructured text, and for each record, returns a list of key phrases. 此技能使用认知服务中的文本分析提供的机器学习模型。This skill uses the machine learning models provided by Text Analytics in Cognitive Services.

如果你需要快速确定记录中的谈话要点,此功能十分有用。This capability is useful if you need to quickly identify the main talking points in the record. 例如,给定输入文本“The food was delicious and there were wonderful staff”,服务会返回“food”和“wonderful staff”。For example, given input text "The food was delicious and there were wonderful staff", the service returns "food" and "wonderful staff".

备注

通过增大处理频率、添加更多文档或添加更多 AI 算法来扩大范围时,需要附加可计费的认知服务资源As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Cognitive Services resource. 调用认知服务中的 API 以及在 Azure 认知搜索中的文档破解阶段提取图像时,会产生费用。Charges accrue when calling APIs in Cognitive Services, and for image extraction as part of the document-cracking stage in Azure Cognitive Search. 提取文档中的文本不会产生费用。There are no charges for text extraction from documents.

当你行使内置技能时,我们会按现有的认知服务预付费价格收费。Execution of built-in skills is charged at the existing Cognitive Services pay-in-advance price. 图像提取定价如 Azure 认知搜索定价页所述。Image extraction pricing is described on the Azure Cognitive Search pricing page.

@odata.type

Microsoft.Skills.Text.KeyPhraseExtractionSkillMicrosoft.Skills.Text.KeyPhraseExtractionSkill

数据限制Data limits

记录的最大大小应为 50,000 个字符,通过 String.Length 进行测量。The maximum size of a record should be 50,000 characters as measured by String.Length. 如果在将数据发送到关键短语提取器之前需要拆分数据,请使用文本拆分技能If you need to break up your data before sending it to the key phrase extractor, consider using the Text Split skill.

技能参数Skill parameters

参数区分大小写。Parameters are case-sensitive.

输入Inputs 说明Description
defaultLanguageCode (可选)要应用到未显式指定语言的文档的语言代码。(Optional) The language code to apply to documents that don't specify language explicitly. 如果未指定默认语言代码,会将英语 (en) 用作默认语言代码。If the default language code is not specified, English (en) will be used as the default language code.
请参阅支持的语言的完整列表See Full list of supported languages.
maxKeyPhraseCount (可选)要生成的关键短语的最大数量。(Optional) The maximum number of key phrases to produce.

技能输入Skill inputs

输入Input 说明Description
text 要分析的文本。The text to be analyzed.
languageCode 表示记录的语言的字符串。A string indicating the language of the records. 如果未指定此参数,将使用默认语言代码分析记录。If this parameter is not specified, the default language code will be used to analyze the records.
请参阅支持的语言的完整列表See Full list of supported languages

技能输出Skill outputs

输出Output 说明Description
keyPhrases 从输入文本中提取的关键短语的列表。A list of key phrases extracted from the input text. 关键短语按重要性顺序返回。The key phrases are returned in order of importance.

示例定义Sample definition

考虑具有以下字段的 SQL 记录:Consider a SQL record that has the following fields:

{
    "content": "Glaciers are huge rivers of ice that ooze their way over land, powered by gravity and their own sheer weight. They accumulate ice from snowfall and lose it through melting. As global temperatures have risen, many of the world’s glaciers have already started to shrink and retreat. Continued warming could see many iconic landscapes – from the Canadian Rockies to the Mount Everest region of the Himalayas – lose almost all their glaciers by the end of the century.",
    "language": "en"
}

然后,技能定义可能会如下所示:Then your skill definition may look like this:

 {
    "@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
    "inputs": [
      {
        "name": "text",
        "source": "/document/content"
      },
      {
        "name": "languageCode",
        "source": "/document/language" 
      }
    ],
    "outputs": [
      {
        "name": "keyPhrases",
        "targetName": "myKeyPhrases"
      }
    ]
  }

示例输出Sample output

对于上面的示例,技能输出将写入到扩充树中名为“document/myKeyPhrases”的新节点,因为这是我们指定的 targetNameFor the example above, the output of your skill will be written to a new node in the enriched tree called "document/myKeyPhrases" since that is the targetName that we specified. 如果未指定 targetName,则为“document/keyPhrases”。If you don’t specify a targetName, then it would be "document/keyPhrases".

document/myKeyPhrasesdocument/myKeyPhrases

            [
              "world’s glaciers", 
              "huge rivers of ice", 
              "Canadian Rockies", 
              "iconic landscapes",
              "Mount Everest region",
              "Continued warming"
            ]

可以使用“document/myKeyPhrases”作为其他技能的输入,或者将其作为输出字段映射的源。You may use "document/myKeyPhrases" as input into other skills, or as a source of an output field mapping.

错误和警告Errors and warnings

如果提供了不支持的语言代码,会生成错误且不提取关键短语。If you provide an unsupported language code, an error is generated and key phrases are not extracted. 如果你的文本为空,则不会生成警告。If your text is empty, a warning will be produced. 如果文本大于 50,000 个字符,只会分析前 50,000 个字符,并会发出警告。If your text is larger than 50,000 characters, only the first 50,000 characters will be analyzed and a warning will be issued.

另请参阅See also