语言检测认知技能Language detection cognitive skill

语言检测 技能检测输入文本的语言,并报告在请求中提交的每个文档的单一语言代码。The Language Detection skill detects the language of input text and reports a single language code for every document submitted on the request. 语言代码配有表示分析长度的得分。The language code is paired with a score indicating the strength of the analysis. 此技能使用认知服务中的文本分析提供的机器学习模型。This skill uses the machine learning models provided by Text Analytics in Cognitive Services.

当需要提供文本的语言作为其他技能(例如,情绪分析技能文本拆分技能)的输入时,此功能尤其有用。This capability is especially useful when you need to provide the language of the text as input to other skills (for example, the Sentiment Analysis skill or Text Split skill).

语言检测利用必应的自然语言处理库,此类库超出为文本分析列出的受支持的语言和区域的数目。Language detection leverages Bing's natural language processing libraries, which exceeds the number of supported languages and regions listed for Text Analytics. 语言的具体列表未发布,但包含所有广泛传播的语言,以及变体、方言和某些区域性的和文化性的语言。The exact list of languages is not published, but includes all widely-spoken languages, plus variants, dialects, and some regional and cultural languages. 如果你的内容是采用不常用的语言表达的,可以尝试语言检测 API,看是否会返回一个代码。If you have content expressed in a less frequently used language, you can try the Language Detection API to see if it returns a code. 无法检测到的语言的响应为 unknownThe response for languages that cannot be detected is unknown.

备注

通过增大处理频率、添加更多文档或添加更多 AI 算法来扩大范围时,需要附加可计费的认知服务资源As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Cognitive Services resource. 调用认知服务中的 API 以及在 Azure 认知搜索中的文档破解阶段提取图像时,会产生费用。Charges accrue when calling APIs in Cognitive Services, and for image extraction as part of the document-cracking stage in Azure Cognitive Search. 提取文档中的文本不会产生费用。There are no charges for text extraction from documents.

当你行使内置技能时,我们会按现有的认知服务预付费价格收费。Execution of built-in skills is charged at the existing Cognitive Services pay-in-advance price. 图像提取定价如 Azure 认知搜索定价页所述。Image extraction pricing is described on the Azure Cognitive Search pricing page.

@odata.type

Microsoft.Skills.Text.LanguageDetectionSkillMicrosoft.Skills.Text.LanguageDetectionSkill

数据限制Data limits

记录的最大大小应为 50,000 个字符,通过 String.Length 进行测量。The maximum size of a record should be 50,000 characters as measured by String.Length. 如果在将数据发送到语言检测技能之前需要拆分数据,可以使用文本拆分技能If you need to break up your data before sending it to the language detection skill, you may use the Text Split skill.

技能输入Skill inputs

参数区分大小写。Parameters are case-sensitive.

输入Inputs 说明Description
text 要分析的文本。The text to be analyzed.

技能输出Skill outputs

输出名称Output Name 说明Description
languageCode 标识语言的 ISO 6391 语言代码。The ISO 6391 language code for the language identified. 例如,“en”。For example, "en".
languageName 语言的名称。The name of language. 例如,“英语”。For example "English".
score 一个介于 0 和 1 之间的值。A value between 0 and 1. 正确标识语言的可能性。The likelihood that language is correctly identified. 如果句子中有混合语言,得分可能会低于 1。The score may be lower than 1 if the sentence has mixed languages.

示例定义Sample definition

 {
    "@odata.type": "#Microsoft.Skills.Text.LanguageDetectionSkill",
    "inputs": [
      {
        "name": "text",
        "source": "/document/text"
      }
    ],
    "outputs": [
      {
        "name": "languageCode",
        "targetName": "myLanguageCode"
      },
      {
        "name": "languageName",
        "targetName": "myLanguageName"
      },
      {
        "name": "score",
        "targetName": "myLanguageScore"
      }

    ]
  }

示例输入Sample input

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "Glaciers are huge rivers of ice that ooze their way over land, powered by gravity and their own sheer weight. "
           }
      },
      {
        "recordId": "2",
        "data":
           {
             "text": "Estamos muy felices de estar con ustedes."
           }
      }
    ]

示例输出Sample output

{
    "values": [
      {
        "recordId": "1",
        "data":
            {
              "languageCode": "en",
              "languageName": "English",
              "score": 1,
            }
      },
      {
        "recordId": "2",
        "data":
            {
              "languageCode": "es",
              "languageName": "Spanish",
              "score": 1,
            }
      }
    ]
}

错误案例Error cases

如果以不支持的语言表达文本,会生成错误,且不返回任何语言标识符。If text is expressed in an unsupported language, an error is generated and no language identifier is returned.

另请参阅See also