了解哪些良好的话语适用于你的 LUIS 应用Understand what good utterances are for your LUIS app

陈述是应用需要解释的用户输入。Utterances are input from the user that your app needs to interpret. 若要训练 LUIS 从其中提取意向和实体,请务必为每个意向捕获各种不同的示例话语。To train LUIS to extract intents and entities from them, it's important to capture a variety of different example utterances for each intent. 主动学习或继续针对新陈述训练的过程对于 LUIS 提供的机器学习智能至关重要。Active learning, or the process of continuing to train on new utterances, is essential to machine-learned intelligence that LUIS provides.

收集你认为用户会输入的话语。Collect utterances that you think users will enter. 请提供含义相同但以各种不同的方式构造的话语:Include utterances, which mean the same thing but are constructed in a variety of different ways:

  • 话语长度 - 根据客户端应用程序选择短、中和长Utterance length - short, medium, and long for your client-application
  • 单词和短语的长度Word and phrase length
  • 单词放置 - 实体位于话语的开头、中间和末尾Word placement - entity at beginning, middle, and end of utterance
  • 语法Grammar
  • 复数形式Pluralization
  • 词干Stemming
  • 名词和动词选择Noun and verb choice
  • 标点 - 使用正确、不正确语法以及无语法时的多种多样的标点Punctuation - a good variety using correct, incorrect, and no grammar

如何选择不同的陈述How to choose varied utterances

第一次开始将示例话语添加到 LUIS 模型时,请记住以下原则。When you first get started by adding example utterances to your LUIS model, here are some principles to keep in mind.

陈述并非始终格式正确Utterances aren't always well formed

它可能是一个句子,比如“为我预订到巴黎的机票”,也可能是句子的片段,比如“预订”或“巴黎航班”。It may be a sentence, like "Book a ticket to Paris for me", or a fragment of a sentence, like "Booking" or "Paris flight." 用户常犯拼写错误。Users often make spelling mistakes.

如果你没有对用户陈述进行拼写检查,则应该针对包含拼写错误的陈述训练 LUIS。If you do not spell check user utterances, you should train LUIS on utterances that include typos and misspellings.

使用用户的代表性语言Use the representative language of the user

选择话语时,请注意,你认为是常用术语或短语的内容对于客户端应用程序的典型用户来说可能不正确。When choosing utterances, be aware that what you think is a common term or phrase might not be correct for the typical user of your client application. 他们可能没有域经验。They may not have domain experience. 请谨慎使用仅当用户是专家时才会说的术语或短语。Be careful when using terms or phrases that a user would only say if they were an expert.

选择不同的术语和措辞Choose varied terminology as well as phrasing

你会发现,即使你努力创造不同的句型,你仍然会重复一些词汇。You will find that even if you make efforts to create varied sentence patterns, you will still repeat some vocabulary.

以这些示例陈述为例:Take these example utterances:

示例陈述Example utterances
如何买计算机?how do I get a computer?
在哪里买计算机?Where do I get a computer?
我想要一台计算机,我该怎么做?I want to get a computer, how do I go about it?
我什么时候能有一台计算机?When can I have a computer?

这里的核心术语“计算机”没有变化。The core term here, "computer", isn't varied. 可以使用替代话语“台式电脑”、“笔记本电脑”、“工作站”,甚至是“机器”。Use alternatives such as desktop computer, laptop, workstation, or even just machine. LUIS 会智能地从上下文中推断同义词,但当你为训练创建陈述时,最好还是改变它们。LUIS intelligently infers synonyms from context, but when you create utterances for training, it's still better to vary them.

每个意向的示例陈述Example utterances in each intent

每个意向都需要有示例话语,至少 15 个。Each intent needs to have example utterances, at least 15. 如果你的意向没有任何示例陈述,则将无法训练 LUIS。If you have an intent that does not have any example utterances, you will not be able to train LUIS. 如果你的意向只有一个或非常少的示例陈述,则 LUIS 将无法准确预测意向。If you have an intent with one or very few example utterances, LUIS will not accurately predict the intent.

为每个创作迭代添加由 15 个话语构成的小组Add small groups of 15 utterances for each authoring iteration

在模型的每个迭代中,不要添加大量陈述。In each iteration of the model, do not add a large quantity of utterances. 添加数量为 15 的话语。Add utterances in quantities of 15. 再次训练Train again.

LUIS 使用由 LUIS 模型作者精心挑选的话语构建有效的模型。LUIS builds effective models with utterances that are carefully selected by the LUIS model author. 添加太多话语是没有价值的,因为它会引起混乱。Adding too many utterances isn't valuable because it introduces confusion.

话语规范化Utterance normalization

话语规范化是指在训练和预测期间忽略标点和音调符号的影响这一过程。Utterance normalization is the process of ignoring the effects of punctuation and diacritics during training and prediction.

音调符号和标点的话语规范化Utterance normalization for diacritics and punctuation

话语规范化是在你创建或导入应用时定义的,因为它是应用 JSON 文件中的设置。Utterance normalization is defined when you create or import the app because it is a setting in the app JSON file. 话语规范化设置默认关闭。The utterance normalization settings are turned off by default.

音调符号是文本中的标记或符号,例如:Diacritics are marks or signs within the text, such as:

İ ı Ş Ğ ş ğ ö ü

如果应用打开规范化,则对于使用音调符号或标点的所有话语来说,“测试”窗格、批量测试和终结点查询中的分数会变化。If your app turns normalization on, scores in the Test pane, batch tests, and endpoint queries will change for all utterances using diacritics or punctuation.

settings 参数中针对 LUIS JSON 应用文件的音调符号或标点打开话语规范化。Turn on utterance normalization for diacritics or punctuation to your LUIS JSON app file in the settings parameter.

"settings": [
    {"name": "NormalizePunctuation", "value": "true"},
    {"name": "NormalizeDiacritics", "value": "true"}
] 

规范化标点是指在训练模型和预测终结点查询之前,从话语中删除标点。Normalizing punctuation means that before your models get trained and before your endpoint queries get predicted, punctuation will be removed from the utterances.

规范化音调符号是指将话语中带音调符号的字符替换为常规字符。Normalizing diacritics replaces the characters with diacritics in utterances with regular characters. 例如:Je parle français 变成了 Je parle francaisFor example: Je parle français becomes Je parle francais.

规范化不是指不会在示例话语或预测响应中看到标点和音调符号,而是指在训练和预测过程中会将其忽略。Normalization doesn’t mean you will not see punctuation and diacritics in your example utterances or prediction responses, merely that they will be ignored during training and prediction.

标点符号Punctuation marks

如果标点未规范化,则默认情况下,LUIS 不会忽略标点符号,因为某些客户端应用程序可能会对这些标记赋予含义。If punctuation is not normalized, LUIS doesn't ignore punctuation marks, by default, because some client applications may place significance on these marks. 确保示例话语使用“标点”和“无标点”,以便两种样式都返回相同的相对分数。Make sure your example utterances use both punctuation and no punctuation in order for both styles to return the same relative scores.

如果标点在客户端应用程序中没有特定含义,请考虑通过规范化标点来忽略标点If punctuation has no specific meaning in your client application, consider ignoring punctuation by normalizing punctuation.

忽略单词和标点Ignoring words and punctuation

若要忽略模式中的特定单词或标点,请将 pattern 与方括号 []ignore 语法配合使用。If you want to ignore specific words or punctuation in patterns, use a pattern with the ignore syntax of square brackets, [].

训练陈述Training utterances

训练通常是非确定性的:在不同版本或应用中,陈述预测可能略有不同。Training is generally non-deterministic: the utterance prediction could vary slightly across versions or apps. 可以通过使用 UseAllTrainingData 名称/值对更新版本设置 API 来使用所有训练数据。You can remove non-deterministic training by updating the version settings API with the UseAllTrainingData name/value pair to use all training data.

测试陈述Testing utterances

开发人员应通过向预测终结点 URL 发送话语来开始使用实际流量测试其 LUIS 应用程序。Developers should start testing their LUIS application with real traffic by sending utterances to the prediction endpoint URL. 使用 LUIS 网站测试窗格提交的测试不会通过终结点发送,因此不会对主动学习有所帮助。Tests submitted with the LUIS website testing pane are not sent through the endpoint, and so do not contribute to active learning.

审查陈述Review utterances

LUIS 会选择意向或实体得分较低的终结点陈述。LUIS selects endpoint utterances that have low scores for either the intent or entity.

最佳实践Best practices

查看最佳做法并将其应用为常规创作周期的一部分。Review best practices and apply them as part of your regular authoring cycle.