了解哪些良好的话语适用于你的 LUIS 应用Understand what good utterances are for your LUIS app

陈述 是应用需要解释的用户输入。Utterances are input from the user that your app needs to interpret. 若要训练 LUIS 从其中提取意向和实体,请务必为每个意向捕获各种不同的示例话语。To train LUIS to extract intents and entities from them, it's important to capture a variety of different example utterances for each intent. 主动学习或继续针对新言语进行训练的过程对于 LUIS 提供的机器学习智能至关重要。Active learning, or the process of continuing to train on new utterances, is essential to machine-learning intelligence that LUIS provides.

收集你认为用户会输入的话语。Collect utterances that you think users will enter. 请提供含义相同但以各种不同的方式构造的话语:Include utterances, which mean the same thing but are constructed in a variety of different ways:

  • 话语长度 - 根据客户端应用程序选择短、中和长Utterance length - short, medium, and long for your client-application
  • 单词和短语的长度Word and phrase length
  • 单词放置 - 实体位于话语的开头、中间和末尾Word placement - entity at beginning, middle, and end of utterance
  • 语法Grammar
  • 复数形式Pluralization
  • 词干Stemming
  • 名词和动词选择Noun and verb choice
  • 标点 - 使用正确语法、不正确语法以及无语法时的标点多种多样Punctuation - a good variety using correct, incorrect, and no grammar

如何选择不同的陈述How to choose varied utterances

第一次开始将示例话语 添加到 LUIS 模型时,请记住以下原则。When you first get started by adding example utterances to your LUIS model, here are some principles to keep in mind.

陈述并非始终格式正确Utterances aren't always well formed

它可能是一个句子,比如“为我预订到巴黎的机票”,也可能是句子的片段,比如“预订”或“巴黎航班”。It may be a sentence, like "Book a ticket to Paris for me", or a fragment of a sentence, like "Booking" or "Paris flight." 用户常犯拼写错误。Users often make spelling mistakes.

应该针对包含拼写错误的陈述训练 LUIS。You should train LUIS on utterances that include typos and misspellings.

使用用户的代表性语言Use the representative language of the user

选择话语时,请注意,你认为是常用术语或短语的内容对于客户端应用程序的典型用户来说可能不正确。When choosing utterances, be aware that what you think is a common term or phrase might not be correct for the typical user of your client application. 他们可能没有域经验。They may not have domain experience. 请谨慎使用仅当用户是专家时才会说的术语或短语。Be careful when using terms or phrases that a user would only say if they were an expert.

选择不同的术语和措辞Choose varied terminology as well as phrasing

你会发现,即使你努力创造不同的句型,你仍然会重复一些词汇。You will find that even if you make efforts to create varied sentence patterns, you will still repeat some vocabulary.

以这些示例陈述为例:Take these example utterances:

示例陈述Example utterances
如何买计算机?how do I get a computer?
在哪里买计算机?Where do I get a computer?
我想要一台计算机,我该怎么做?I want to get a computer, how do I go about it?
我什么时候能有一台计算机?When can I have a computer?

这里的核心术语“计算机”没有变化。The core term here, "computer," isn't varied. 可以使用替代话语“台式电脑”、“笔记本电脑”、“工作站”,甚至是“机器”。Use alternatives such as desktop computer, laptop, workstation, or even just machine. LUIS 可以根据上下文智能地推断同义词,但当你创建用于训练的话语时,最好是改变它们。LUIS can intelligently infer synonyms from context, but when you create utterances for training, it's always better to vary them.

每个意向的示例陈述Example utterances in each intent

每个意向都需要有示例话语,至少 15 个。Each intent needs to have example utterances, at least 15. 如果你的意向没有任何示例陈述,则将无法训练 LUIS。If you have an intent that does not have any example utterances, you will not be able to train LUIS. 如果你的意向仅包含一个或非常少的示例话语,LUIS 可能无法准确预测该意向。If you have an intent with one or very few example utterances, LUIS may not accurately predict the intent.

为每个创作迭代添加由 15 个话语构成的小组Add small groups of 15 utterances for each authoring iteration

在模型的每个迭代中,不要添加大量陈述。In each iteration of the model, do not add a large quantity of utterances. 添加数量为 15 的话语。Add utterances in quantities of 15. 再次训练发布测试Train, publish, and test again.

LUIS 使用由 LUIS 模型作者精心挑选的话语构建有效的模型。LUIS builds effective models with utterances that are carefully selected by the LUIS model author. 添加太多话语是没有价值的,因为它会引起混乱。Adding too many utterances isn't valuable because it introduces confusion.

最好先从几个陈述开始,然后审查终结点陈述以进行正确的意向预测和实体提取。It is better to start with a few utterances, then review endpoint utterances for correct intent prediction and entity extraction.

话语规范化Utterance normalization

言语规范化是指在训练和预测时忽略文本类型(如标点符号和音调符号)的影响这一过程。Utterance normalization is the process of ignoring the effects of types of text, such as punctuation and diacritics, during training and prediction.

话语规范化设置默认关闭。The utterance normalization settings are turned off by default. 这些设置包括:These settings include:

  • 单词形式Word forms
  • 音调符号Diacritics
  • 标点Punctuation

如果启用规范化设置,则对于该规范化设置的所有言语,“测试”窗格、批量测试和终结点查询中的分数都会变化。If you turn on a normalization setting, scores in the Test pane, batch tests, and endpoint queries will change for all utterances for that normalization setting.

在 LUIS 门户中克隆版本时,版本设置将传递到新的已克隆版本。When you clone a version in the LUIS portal, the version settings continue to the new cloned version.

通过 LUIS 门户的“应用程序设置”页的“管理”部分或更新版本设置 API 设置版本设置。Set the version settings via the LUIS portal, on the Manage section, on the Application Settings page, or the Update Version Settings API. 若要详细了解这些规范化设置更改,请参阅参考资料Learn more about these normalization changes in the Reference.

单词形式Word forms

规范单词形式会忽略扩展到词根之外的单词的差异。Normalizing word forms ignores the differences in words that expand beyond the root.


音调符号是文本中的标记或符号,例如:Diacritics are marks or signs within the text, such as:

İ ı Ş Ğ ş ğ ö ü

标点符号Punctuation marks

规范化 标点 是指在训练模型和预测终结点查询之前,从话语中删除标点。Normalizing punctuation means that before your models get trained and before your endpoint queries get predicted, punctuation will be removed from the utterances.

标点是 LUIS 中单独的标记。Punctuation is a separate token in LUIS. 在末尾包含句号的话语与末尾不包含句号的话语是两个单独话语并可能得到两种不同预测。An utterance that contains a period at the end versus an utterance that does not contain a period at the end are two separate utterances and may get two different predictions.

如果标点未规范化,则默认情况下,LUIS 不会忽略标点符号,因为某些客户端应用程序可能会对这些标记赋予含义。If punctuation is not normalized, LUIS doesn't ignore punctuation marks, by default, because some client applications may place significance on these marks. 确保示例话语使用“标点”和“无标点”,以便两种样式都返回相同的相对分数。Make sure your example utterances use both punctuation and no punctuation in order for both styles to return the same relative scores.

请确保模型在示例言语(有标点和没有标点)或在更容易使用特殊语法来忽略标点的模式中处理标点:I am applying for the {Job} position[.]Make sure the model handles punctuation either in the example utterances (having and not having punctuation) or in the patterns where it is easier to ignore punctuation with the special syntax: I am applying for the {Job} position[.]

如果标点在客户端应用程序中没有特定含义,请考虑通过规范化标点来忽略标点If punctuation has no specific meaning in your client application, consider ignoring punctuation by normalizing punctuation.

忽略单词和标点Ignoring words and punctuation

若要忽略模式中的特定单词或标点,请将 pattern 与方括号 []ignore 语法配合使用。If you want to ignore specific words or punctuation in patterns, use a pattern with the ignore syntax of square brackets, [].

使用所有言语的训练Training with all utterances

训练通常是非确定性的:在不同版本或应用中,陈述预测可能略有不同。Training is generally non-deterministic: the utterance prediction could vary slightly across versions or apps. 可以通过使用 UseAllTrainingData 名称/值对更新版本设置 API 来使用所有训练数据。You can remove non-deterministic training by updating the version settings API with the UseAllTrainingData name/value pair to use all training data.

测试陈述Testing utterances

开发人员应通过向预测终结点 URL 发送话语来开始使用实际流量测试其 LUIS 应用程序。Developers should start testing their LUIS application with real traffic by sending utterances to the prediction endpoint URL. 这些陈述用于通过审查陈述来改善意向和实体的表现。These utterances are used to improve the performance of the intents and entities with Review utterances. 使用 LUIS 网站测试窗格提交的测试不会通过终结点发送,因此不会对主动学习有所帮助。Tests submitted with the LUIS website testing pane are not sent through the endpoint, and so do not contribute to active learning.

评审陈述Review utterances

在模型经过训练、发布并接收终结点查询后,请审查 LUIS 建议的陈述After your model is trained, published, and receiving endpoint queries, review the utterances suggested by LUIS. LUIS 会选择意向或实体得分较低的终结点陈述。LUIS selects endpoint utterances that have low scores for either the intent or entity.

最佳实践Best practices

查看最佳做法并将其应用为常规创作周期的一部分。Review best practices and apply them as part of your regular authoring cycle.

字词含义的标签Label for word meaning

如果选词或字词排列方式相同,但含义并不相同,请勿将其标记为实体。If the word choice or word arrangement is the same, but doesn't mean the same thing, do not label it with the entity.

以下话语中,fair 一词为同形异义词。The following utterances, the word fair is a homograph. 该词虽拼写相同但含义不同:It is spelled the same but has a different meaning:

今年夏天西雅图地区会举办什么样的乡村集市?What kind of county fairs are happening in the Seattle area this summer?
西雅图评审的当前评级公平吗?Is the current rating for the Seattle review fair?

如果希望事件实体查找所有事件数据,请标记第一个话语中的 fair 一词,而不是第二个话语。If you wanted an event entity to find all event data, label the word fair in the first utterance, but not in the second.

后续步骤Next steps

有关定型 LUIS 应用以理解用户话语的信息,请参阅添加示例话语See Add example utterances for information on training a LUIS app to understand user utterances.