命名实体识别认知技能Named Entity Recognition cognitive skill

命名实体识别 技能可以从文本中提取命名实体。The Named Entity Recognition skill extracts named entities from text. 可用实体包括 personlocationorganization 类型。Available entities include the types person, location and organization.

重要

已命名的实体识别技能现已弃用,替换为 Microsoft.Skills.Text.EntityRecognitionSkillNamed entity recognition skill is now discontinued replaced by Microsoft.Skills.Text.EntityRecognitionSkill. 已于 2019 年 2 月 15 日停止支持,并且已于 2019 年 5 月 2 日将此 API 从产品中删除。Support stopped on February 15, 2019 and the API was removed from the product on May 2, 2019. 建议按照已弃用的认知搜索技能来迁移到受支持的技能。Follow the recommendations in Deprecated cognitive search skills to migrate to a supported skill.

备注

通过增大处理频率、添加更多文档或添加更多 AI 算法来扩大范围时,需要附加可计费的认知服务资源As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Cognitive Services resource. 调用认知服务中的 API 以及在 Azure 认知搜索中的文档破解阶段提取图像时,会产生费用。Charges accrue when calling APIs in Cognitive Services, and for image extraction as part of the document-cracking stage in Azure Cognitive Search. 提取文档中的文本不会产生费用。There are no charges for text extraction from documents.

当你行使内置技能时,我们会按现有的认知服务预付费价格收费。Execution of built-in skills is charged at the existing Cognitive Services pay-in-advance price. 图像提取定价如 Azure 认知搜索定价页所述。Image extraction pricing is described on the Azure Cognitive Search pricing page.

@odata.type

Microsoft.Skills.Text.NamedEntityRecognitionSkillMicrosoft.Skills.Text.NamedEntityRecognitionSkill

数据限制Data limits

记录的最大大小应为 50,000 个字符,通过 String.Length 进行测量。The maximum size of a record should be 50,000 characters as measured by String.Length. 如果在将数据发送到关键短语提取器之前需要拆分数据,请使用文本拆分技能If you need to break up your data before sending it to the key phrase extractor, consider using the Text Split skill.

技能参数Skill parameters

参数区分大小写。Parameters are case-sensitive.

参数名称Parameter name 说明Description
categoriescategories 应提取的类别的数组。Array of categories that should be extracted. 可能类别类型有:"Person""Location""Organization"Possible category types: "Person", "Location", "Organization". 如果不提供类别,则返回所有类型。If no category is provided, all types are returned.
defaultLanguageCodedefaultLanguageCode 输入文本的语言代码。Language code of the input text. 支持以下语言:de, en, es, fr, itThe following languages are supported: de, en, es, fr, it
minimumPrecisionminimumPrecision 介于 0 和 1 之间的数字。A number between 0 and 1. 如果精度低于此值,则不会返回该实体。If the precision is lower than this value, the entity is not returned. 默认值为 0。The default is 0.

技能输入Skill inputs

输入名称Input name 说明Description
languageCodelanguageCode 可选。Optional. 默认为 "en"Default is "en".
texttext 要分析的文本。The text to analyze.

技能输出Skill outputs

输出名称Output name 说明Description
人员persons 一个字符串数组,其中,一个字符串表示一个人员名称。An array of strings where each string represents the name of a person.
位置locations 一个字符串数组,其中,一个字符串表示一个位置。An array of strings where each string represents a location.
组织organizations 一个字符串数组,其中,一个字符串表示一个组织。An array of strings where each string represents an organization.
实体entities 一个复杂类型数组。An array of complex types. 每个复杂类型都包含以下字段:Each complex type includes the following fields:
  • 类别("person""organization""location"category ("person", "organization", or "location")
  • 值(实际实体名称)value (the actual entity name)
  • 偏移(在文本中找到它的位置)offset (The location where it was found in the text)
  • 置信度(一个介于 0 和 1 之间的值,表示值是实际实体的置信度)confidence (A value between 0 and 1 that represents that confidence that the value is an actual entity)

示例定义Sample definition

  {
    "@odata.type": "#Microsoft.Skills.Text.NamedEntityRecognitionSkill",
    "categories": [ "Person", "Location", "Organization"],
    "defaultLanguageCode": "en",
    "inputs": [
      {
        "name": "text",
        "source": "/document/content"
      }
    ],
    "outputs": [
      {
        "name": "persons",
        "targetName": "people"
      }
    ]
  }

示例输入Sample input

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "This is the loan application for Joe Romero, a Microsoft employee who was born in Chile and who then moved to Australia… Ana Smith is provided as a reference.",
             "languageCode": "en"
           }
      }
    ]
}

示例输出Sample output

{
  "values": [
    {
      "recordId": "1",
      "data" : 
      {
        "persons": [ "Joe Romero", "Ana Smith"],
        "locations": ["Chile", "Australia"],
        "organizations":["Microsoft"],
        "entities":  
        [
          {
            "category":"person",
            "value": "Joe Romero",
            "offset": 33,
            "confidence": 0.87
          },
          {
            "category":"person",
            "value": "Ana Smith",
            "offset": 124,
            "confidence": 0.87
          },
          {
            "category":"location",
            "value": "Chile",
            "offset": 88,
            "confidence": 0.99
          },
          {
            "category":"location",
            "value": "Australia",
            "offset": 112,
            "confidence": 0.99
          },
          {
            "category":"organization",
            "value": "Microsoft",
            "offset": 54,
            "confidence": 0.99
          }
        ]
      }
    }
  ]
}

错误案例Error cases

如果文档的语言代码不受支持,则返回错误,并且不提取任何实体。If the language code for the document is unsupported, an error is returned and no entities are extracted.

另请参阅See also