LUIS 的语言和区域支持Language and region support for LUIS

LUIS 在服务中具有多种功能。LUIS has a variety of features within the service. 并非所有功能都会同等地以各种语言提供。Not all features are at the same language parity. 请确保你所定位的语言文化支持你感兴趣的功能。Make sure the features you are interested in are supported in the language culture you are targeting. LUIS 应用特定于区域性,一旦设置即无法更改。A LUIS app is culture-specific and cannot be changed once it is set.

多语言 LUIS 应用Multi-language LUIS apps

如果需要多语言 LUIS 客户端应用程序(例如聊天机器人),可通过几种方法实现。If you need a multi-language LUIS client application such as a chatbot, you have a few options. 如果 LUIS 支持所有语言,则需面向每种语言开发一个 LUIS 应用。If LUIS supports all the languages, you develop a LUIS app for each language. 每个 LUIS 应用都具有唯一的应用 ID 和终结点日志。Each LUIS app has a unique app ID, and endpoint log. 如果需要为 LUIS 不支持的语言提供语言理解,可使用翻译器服务将言语翻译成受支持的语言,接着将言语提交到 LUIS 终结点,然后接收生成的分数。If you need to provide language understanding for a language LUIS does not support, you can use the Translator service to translate the utterance into a supported language, submit the utterance to the LUIS endpoint, and receive the resulting scores.

支持的语言Languages supported

LUIS 理解以下语言:LUIS understands utterances in the following languages:

语言Language LocaleLocale 预生成域Prebuilt domain 预生成实体Prebuilt entity 短语列表建议Phrase list recommendations **文本分析**Text analytics
(情绪和(Sentiment and
关键字)Keywords)
英语(美国)English (United States) en-US
阿拉伯语(预览版 - 现代标准阿拉伯语)Arabic (preview - modern standard Arabic) ar-AR - - - -
*中文*Chinese zh-cn -
荷兰语Dutch nl-NL - -
法语(法国)French (France) fr-FR
法语(加拿大)French (Canada) fr-CA - - -
德语German de-DE
古吉拉特语Gujarati gu-IN - - - -
HindiHindi hi-IN - - -
意大利语Italian it-IT
*日语*Japanese ja-JP 仅关键短语Key phrase only
朝鲜语Korean ko-KR - - 仅关键短语Key phrase only
马拉地语Marathi mr-IN - - - -
葡萄牙语(巴西)Portuguese (Brazil) pt-BR 并非所有亚区域性not all sub-cultures
西班牙语(西班牙)Spanish (Spain) es-ES
西班牙语(墨西哥)Spanish (Mexico) es-MX - -
泰米尔语Tamil ta-IN - - - -
泰卢固语Telugu te-IN - - - -
土耳其语Turkish tr-TR - 仅情绪Sentiment only

预生成实体预生成域具有不同的语言支持。Language support varies for prebuilt entities and prebuilt domains.

*中文支持说明*Chinese support notes

  • zh-CN 区域性中,LUIS 要求简体中文字符集,而不是繁体字符集。In the zh-CN culture, LUIS expects the simplified Chinese character set instead of the traditional character set.
  • 意向、实体、功能和正则表达式的名称可采用中文或罗马字符。The names of intents, entities, features, and regular expressions may be in Chinese or Roman characters.
  • 请参阅预生成域参考,了解 zh-CN 区域性支持的预生成域。See the prebuilt domains reference for information on which prebuilt domains are supported in the zh-CN culture.

*日语支持说明*Japanese support notes

  • 由于 LUIS 不提供句法分析,并且不能理解敬语和非正式日语之间的差异,因此需要将不同的正式程度作为培训示例整合到应用程序中。Because LUIS does not provide syntactic analysis and will not understand the difference between Keigo and informal Japanese, you need to incorporate the different levels of formality as training examples for your applications.
    • でございます 与 です 不同。でございます is not the same as です.
    • です 与 だ 不同。です is not the same as だ.

**Text analytics support notes

Text analytics includes keyPhrase prebuilt entity and sentiment analysis. Only Portuguese is supported for subcultures: pt-PT and pt-BR. All other cultures are supported at the primary culture level. Learn more about Text Analytics supported languages.

语音 API 支持的语言Speech API supported languages

请参阅语音支持的语言,了解语音听写模式语言。See Speech Supported languages for Speech dictation mode languages.

应用程序中的罕见字词或外来字词Rare or foreign words in an application

en-us 区域性中,LUIS 可学习区分大多数英文字词,包括俚语。In the en-us culture, LUIS learns to distinguish most English words, including slang. zh-cn 区域性中,LUIS 可学习区分大多数中文字符。In the zh-cn culture, LUIS learns to distinguish most Chinese characters. 如果在 en-uszh-cn 中使用一个罕见字词或字符,并且 LUIS 似乎无法识别该字词或字符,则可将该字词或字符添加到短语列表功能If you use a rare word in en-us or character in zh-cn, and you see that LUIS seems unable to distinguish that word or character, you can add that word or character to a phrase-list feature. 例如,应将超出应用程序区域性的字词(即外来字词)添加到短语列表功能。For example, words outside of the culture of the application -- that is, foreign words -- should be added to a phrase-list feature. 应将此短语列表标记为不可互换,以指示罕见字词集组成 LUIS 应学会识别的类,但它们不是同义词,也不能彼此互换。This phrase list should be marked non-interchangeable, to indicate that the set of rare words forms a class that LUIS should learn to recognize, but they are not synonyms or interchangeable with each other.

混合语言Hybrid languages

混合语言混含两个区域性的字词,如英语和中文。Hybrid languages combine words from two cultures such as English and Chinese. 由于单个应用仅基于单个区域性,因此 LUIS 不支持此类语言。These languages are not supported in LUIS because an app is based on a single culture.

词汇切分Tokenization

为了执行机器学习,LUIS 基于区域性将表述拆分成词法单元To perform machine learning, LUIS breaks an utterance into tokens based on culture.

语言Language 每个空格或特殊字符every space or special character 字符级character level 复合词compound words
阿拉伯语Arabic
中文Chinese
荷兰语Dutch
英语 (en-us)English (en-us)
法语 (fr-FR)French (fr-FR)
法语 (fr-CA)French (fr-CA)
德语German
古吉拉特语Gujarati
HindiHindi
意大利语Italian
日语Japanese
朝鲜语Korean
马拉地语Marathi
葡萄牙语(巴西)Portuguese (Brazil)
西班牙语 (es-ES)Spanish (es-ES)
西班牙语 (es-MX)Spanish (es-MX)
泰米尔语Tamil
泰卢固语Telugu
土耳其语Turkish

自定义 tokenizer 版本Custom tokenizer versions

以下区域性具有自定义 tokenizer 版本:The following cultures have custom tokenizer versions:

环境Culture 版本Version 目的Purpose
德语German
de-de
1.0.01.0.0 通过使用基于机器学习的 tokenizer 将单词拆分,尝试将复合单词分解为它们的单个组件,从而对单词进行标记。Tokenizes words by splitting them using a machine learning-based tokenizer that tries to break down composite words into their single components.
如果用户输入 Ich fahre einen krankenwagen 作为话语,它将转换为 Ich fahre einen kranken wagenIf a user enters Ich fahre einen krankenwagen as an utterance, it is turned to Ich fahre einen kranken wagen. 允许将 krankenwagen 分别标记为不同的实体。Allowing the marking of kranken and wagen independently as different entities.
德语German
de-de
1.0.21.0.2 通过基于空格拆分单词来标记单词。Tokenizes words by splitting them on spaces.
如果用户输入 Ich fahre einen krankenwagen 作为言语,则它仍然是单个标记。If a user enters Ich fahre einen krankenwagen as an utterance, it remains a single token. 因此 krankenwagen 标记为单个实体。Thus krankenwagen is marked as a single entity.
荷兰语Dutch
nl-nl
1.0.01.0.0 通过使用基于机器学习的 tokenizer 将单词拆分,尝试将复合单词分解为它们的单个组件,从而对单词进行标记。Tokenizes words by splitting them using a machine learning-based tokenizer that tries to break down composite words into their single components.
如果用户输入 Ik ga naar de kleuterschool 作为话语,它将转换为 Ik ga naar de kleuter schoolIf a user enters Ik ga naar de kleuterschool as an utterance, it is turned to Ik ga naar de kleuter school. 允许将 kleuterschool 分别标记为不同的实体。Allowing the marking of kleuter and school independently as different entities.
荷兰语Dutch
nl-nl
1.0.11.0.1 通过基于空格拆分单词来标记单词。Tokenizes words by splitting them on spaces.
如果用户输入 Ik ga naar de kleuterschool 作为言语,则它仍然是单个标记。If a user enters Ik ga naar de kleuterschool as an utterance, it remains a single token. 因此 kleuterschool 标记为单个实体。Thus kleuterschool is marked as a single entity.

在 tokenizer 版本之间迁移Migrating between tokenizer versions

在应用级别进行词汇切分。Tokenization happens at the app level. 不支持版本级别的词汇切分。There is no support for version-level tokenization.

将文件导入为新应用,而不是版本。Import the file as a new app, instead of a version. 此操作意味着新应用具有不同的应用 ID,但使用文件中指定的 tokenizer 版本。This action means the new app has a different app ID but uses the tokenizer version specified in the file.