有关生成语言理解 (LUIS) 应用的最佳做法Best practices for building a language understanding (LUIS) app

使用应用创作过程生成 LUIS 应用:Use the app authoring process to build your LUIS app:

  • 生成语言模型(意向和实体)Build language models (intents and entities)
  • 添加几个训练言语示例(每个意向 15 到 30 个)Add a few training example utterances (15-30 per intent)
  • 发布到终结点Publish to endpoint
  • 从终结点进行测试Test from endpoint

发布应用后,使用开发生命周期添加特征、进行发布,并从终结点进行测试。Once your app is published, use the development lifecycle to add features, publish, and test from endpoint. 下一个创作周期不要从添加更多示例言语开始,否则 LUIS 无法根据实际用户言语学习你的模型。Do not begin the next authoring cycle by adding more example utterances because that does not let LUIS learn your model with real-world user utterances.

在当前的示例言语与终结点言语集返回可信的较高预测评分之前,请不要展开言语。Do not expand the utterances until the current set of both example and endpoint utterances are returning confident, high prediction scores. 使用主动学习提高评分。Improve scores using active learning.

注意事项Do and Don't

下面的列表包含 LUIS 应用的最佳做法:The following list includes best practices for LUIS apps:

要求事项Do 禁止事项Don't
规划架构Plan your schema 在无计划的情况下生成和发布Build and publish without a plan
应定义不同的意向Define distinct intents
将特征添加到意向Add features to intents
使用机器学习实体Use machine learned entities 将许多话语示例添加到意向Add many example utterances to intents
使用少量或简单实体Use few or simple entities
每个意向需采用合适的详细程度Find a sweet spot between too generic and too specific for each intent 将 LUIS 用作培训平台Use LUIS as a training platform
使用版本以迭代方式生成应用Build your app iteratively with versions
为模型分解生成实体Build entities for model decomposition
添加许多相同格式的话语示例,忽略其他格式Add many example utterances of the same format, ignoring other formats
在后续迭代中添加模式Add patterns in later iterations 混淆意向和实体的定义Mix the definition of intents and entities
跨所有意向来平衡话语,None 意向除外。Balance your utterances across all intents except the None intent.
将示例言语添加到 None 意向Add example utterances to None intent
使用所有可能的值创建短语列表Create phrase lists with all possible values
利用主动学习的建议功能Leverage the suggest feature for active learning 添加的模式过多Add too many patterns
通过批处理测试来监视应用的性能Monitor the performance of your app with batch testing 使用添加的每个话语示例进行训练和发布Train and publish with every single example utterance added

规划架构Do plan your schema

开始构建应用的架构之前,应确定你计划在何处使用此应用,以及使用此应用完成哪些任务。Before you start building your app's schema, you should identify what and where you plan to use this app. 规划越细致、越具体,应用就越好。The more thorough and specific your planning, the better your app becomes.

  • 调查目标用户Research targeted users
  • 定义端到端角色以表示你的应用程序 - 语音、头像、问题处理(主动、被动)Defining end-to-end personas to represent your app - voice, avatar, issue handling (proactive, reactive)
  • 确定用户通过哪些通道进行交互(文本、语音),将其提供给现有解决方案进行处理或为此应用创建新的解决方案Identify user interactions (text, speech) through which channels, handing off to existing solutions or creating a new solution for this app
  • 端到端用户旅程End-to-end user journey
    • 你期望此应用执行哪些任务、不执行哪些任务?What you should expect this app to do and not do? * 其任务的优先级是什么?* What are the priorities of what it should do?
    • 主要用例有哪些?What are the main use cases?
  • 收集数据 - 了解如何收集和准备数据Collecting data - learn about collecting and preparing data

应定义不同的意向Do define distinct intents

确保每个意向的词汇特定于该意向,而不会与其他意向的词汇重叠。Make sure the vocabulary for each intent is just for that intent and not overlapping with a different intent. 例如,如果要创建一款处理行程安排(例如航班和酒店)的应用,可以选择将这些主题领域视作彼此独立的意向或视为同一意向,其中包含话语中特定数据的实体。For example, if you want to have an app that handles travel arrangements such as airline flights and hotels, you can choose to have these subject areas as separate intents or the same intent with entities for specific data inside the utterance.

如果两个意向的词汇相同,请合并意向并使用实体。If the vocabulary between two intents is the same, combine the intent, and use entities.

请考虑以下话语示例:Consider the following example utterances:

示例陈述Example utterances
预订航班Book a flight
预订酒店Book a hotel

Book a flightBook a hotel 使用相同的词汇 book a Book a flight and Book a hotel use the same vocabulary of book a . 此格式相同,因此它应该是同一意向,只是使用不同的词语(flighthotel)作为提取的实体。This format is the same so it should be the same intent with the different words of flight and hotel as extracted entities.

将特征添加到意向Do add features to intents

特征描述了意向的概念。Features describe concepts for an intent. 特征可以是对该意向非常重要的单词短语列表,也可以是对该意向非常重要的实体。A feature can be a phrase list of words that are significant to that intent or an entity that is significant to that intent.

请找到意向的平衡点Do find sweet spot for intents

使用 LUIS 中的预测数据来判定意向是否存在重叠的情况。Use prediction data from LUIS to determine if your intents are overlapping. 重叠的意向会困扰 LUIS。Overlapping intents confuse LUIS. 结果是评分最高的意向会与另一个意向非常接近。The result is that the top scoring intent is too close to another intent. 由于 LUIS 不会在每次训练的数据中使用完全相同的路径,所以重叠意向可能会在训练中排到第一或第二的位置。Because LUIS does not use the exact same path through the data for training each time, an overlapping intent has a chance of being first or second in training. 各意向的话语分数应相互拉开差距以避免出现上述翻转情况。You want the utterance's score for each intention to be farther apart so this flip/flop doesn't happen. 更好地区分意向可以使得每次训练都得出预期的最高分意向。Good distinction for intents should result in the expected top intent every time.

使用机器学习实体Do use machine learned entities

机器学习实体已针对你的应用进行了定制,并要求标记功能成功执行。Machine learned entities are tailored to your app and require labeling to be successful. 如果未在使用机器学习实体,那么你使用的可能是错误的工具。If you are not using machine learned entities, you might be using the wrong tool.

机器学习实体可以使用其他实体作为特征。Machine learned entities can use other entities as features. 这些其他实体可以是自定义实体(如正则表达式实体或列表实体),你也可以使用预生成实体作为特征。These other entities can be custom entities such as regular expression entities or list entities, or you can use prebuilt entities as features.

了解有效的机器学习实体Learn about effective machine learned entities.

使用版本以迭代方式生成应用Do build your app iteratively with versions

每个创作周期应该在一个新版本内进行,从现有版本进行克隆。Each authoring cycle should be within a new version, cloned from an existing version.

为模型分解生成Do build for model decomposition

模型分解的一个典型流程是:Model decomposition has a typical process of:

  • 基于客户端应用的用户意向创建 意向create Intent based on client-app's user intentions
  • 基于实际用户输入添加 15-30 个示例言语add 15-30 example utterances based on real-world user input
  • 标记示例言语中的顶层数据概念label top-level data concept in example utterance
  • 将数据概念分解成子实体break data concept into subentities
  • 向子实体添加特征add features to subentities
  • 将特征添加到意向add features to intents

创建意向并添加示例言语后,以下示例描述实体分解。Once you have created the intent and added example utterances, the following example describes entity decomposition.

首先标识要在言语中提取的整个数据概念。Start by identifying complete data concepts you want to extract in an utterance. 这是你的机器学习实体。This is your machine-learning entity. 然后将短语分解成各个组成部分。Then decompose the phrase into its parts. 这包括标识子实体和特征。This includes identifying subentities, and features.

例如,若要提取某个地址,顶层机器学习实体可以命名为 AddressFor example if you want to extract an address, the top machine-learning entity could be called Address. 创建地址时,标识其某些子实体,例如街道地址、城市、州/省和邮政编码。While creating the address, identify some of its subentities such as street address, city, state, and postal code.

继续通过以下方式分解这些元素:Continue decomposing those elements by:

  • 以正则表达式实体的形式添加所需的邮政编码特征。Adding a required feature of the postal code as a regular expression entity.
  • 将街道地址分解为多个部分:Decomposing the street address into parts:
    • 街道编号,其所需特征是一个预生成的编号实体。A street number with a required feature of a prebuilt entity of number.
    • 街道名称。A street name.
    • 街道类型,其所需特征是一个列表实体(包括“大街”、“环路”、“道路”和“小巷”等词)。A street type with a required feature of a list entity including words such as avenue, circle, road, and lane.

可以使用 V3 创作 API 进行模型分解。The V3 authoring API allows for model decomposition.

在后续的迭代中添加模式Do add patterns in later iterations

在添加模式之前,应该了解应用的行为方式,因为模式比示例言语的权重更大,会影响置信度。You should understand how the app behaves before adding patterns because patterns are weighted more heavily than example utterances and will skew confidence.

了解应用的行为方式后,添加要应用于应用的模式。Once you understand how your app behaves, add patterns as they apply to your app. 不需要在每个迭代中添加模式。You do not need to add them with each iteration.

在设计模型之初就添加它们并没有什么坏处,但如果在使用言语测试模型之后再添加,则可以更容易地看出每个模式如何改变模型。There is no harm adding them in the beginning of your model design but it is easier to see how each pattern changes the model after the model is tested with utterances.

跨所有意向平衡言语Do balance your utterances across all intents

为了使 LUIS 预测准确,每个意向(None 意向除外)中示例话语的数量必须相同(相对说来)。In order for LUIS predictions to be accurate, the quantity of example utterances in each intent (except for the None intent), must be relatively equal.

如果一个意向有 100 个示例话语,另一个意向有 20 个示例话语,则 100 个话语的意向的预测准确率会更高。If you have an intent with 100 example utterances and an intent with 20 example utterances, the 100-utterance intent will have a higher rate of prediction.

务必将话语示例添加至“None”意向Do add example utterances to None intent

此意向是回退意向,指示应用程序以外的所有内容。This intent is the fallback intent, indicating everything outside your application. 针对 LUIS 应用其余部分的每 10 个话语示例,向“None”意向中添加一个话语示例。Add one example utterance to the None intent for every 10 example utterances in the rest of your LUIS app.

应利用主动学习的建议功能Do leverage the suggest feature for active learning

定期使用主动学习的“查看终结点话语”功能,而不是将更多话语示例添加到意向。Use active learning's Review endpoint utterances on a regular basis, instead of adding more example utterances to intents. 因为应用会不断接收终结点话语,所以此列表会不断变化。Because the app is constantly receiving endpoint utterances, this list is growing and changing.

应监视应用的性能Do monitor the performance of your app

使用批量测试集监视预测准确性。Monitor the prediction accuracy using a batch test set.

保留一个独立的言语集,不将其用作示例言语或终结点言语。Keep a separate set of utterances that aren't used as example utterances or endpoint utterances. 针对测试集不断改进应用。Keep improving the app for your test set. 调整测试集以反映真实的用户话语。Adapt the test set to reflect real user utterances. 使用此测试集来评估每次迭代的或每个版本的应用。Use this test set to evaluate each iteration or version of the app.

不要过快发布Don't publish too quickly

如果没有进行适当的规划就过快地发布应用,可能会导致出现多个问题,例如:Publishing your app too quickly, without proper planning, may lead to several issues such as:

  • 在实际场景下,你的应用将无法在可接受的性能级别上运行。Your app will not work in your actual scenario at an acceptable level of performance.
  • 架构(意向和实体)不适用,并且如果已按该架构开发了客户端应用逻辑,则可能需要从头开始对其进行重新编写。The schema (intents and entities) would not be appropriate, and if you have developed client app logic following the schema, you may need to rewrite that from scratch. 这会导致正在处理的项目意外延迟并产生额外的费用。This would cause unexpected delays and an extra cost to the project you are working on.
  • 你添加到模型中的言语可能会对示例言语集造成难以调试和识别的偏差。Utterances you add to the model might cause bias towards the example utterance set that is hard to debug and identify. 它还会导致在已提交到特定架构后难以消除歧义。It will also make removing ambiguity difficult after you have committed to a certain schema.

请勿将许多话语示例添加到意向Don't add many example utterances to intents

发布应用后,仅在开发生命周期过程中添加主动学习中的言语。After the app is published, only add utterances from active learning in the development lifecycle process. 如果话语太过相似,请添加模式。If utterances are too similar, add a pattern.

不要使用少量或简单的实体Don't use few or simple entities

实体是为数据提取和预测而生成的。Entities are built for data extraction and prediction. 重要的一点是,每个意向都包含机器学习实体,这些实体描述意向中的数据。It is important that each intent have machine-learning entities that describe the data in the intent. 这可以帮助 LUIS 预测意向,即使客户端应用程序不需要使用提取的实体。This helps LUIS predict the intent, even if your client application doesn't need to use the extracted entity.

请勿将 LUIS 用作培训平台Don't use LUIS as a training platform

LUIS 特定于语言模型的域。LUIS is specific to a language model's domain. 但并不意味着将其用作常规自然语言训练平台。It isn't meant to work as a general natural language training platform.

请勿添加许多相同格式的话语示例,而忽略其他格式Don't add many example utterances of the same format, ignoring other formats

LUIS 会预期一个意向的话语会存在变体。LUIS expects variations in an intent's utterances. 在总体意思相同的情况下,话语形式可能会有所不同。The utterances can vary while having the same overall meaning. 其差异可能涉及话语长度、字词选择和字词位置等方面。Variations can include utterance length, word choice, and word placement.

请勿使用相同的格式Don't use same format 务必使用不同的格式Do use varying format
购买一张到西雅图的票Buy a ticket to Seattle
购买一张到巴黎的票Buy a ticket to Paris
购买一张到奥兰多的票Buy a ticket to Orlando
购买 1 张到西雅图的票Buy 1 ticket to Seattle
预定下周一到巴黎的夜间定期航班的两个座位Reserve two seats on the red eye to Paris next Monday
我要预订 3 张到奥兰多的票,去度春假I would like to book 3 tickets to Orlando for spring break

第二列使用了不同的动词(购买、预订、预定)、不同的数量(一、两、3)和不同的字词排序,但表达的是相同的意向,就是购买旅行的机票。The second column uses different verbs (buy, reserve, book), different quantities (1, two, 3), and different arrangements of words but all have the same intention of purchasing airline tickets for travel.

请勿混淆意向和实体的定义Don't mix the definition of intents and entities

为聊天机器人将执行的任何操作创建一个意向。Create an intent for any action your bot will take. 将实体用作实现操作的参数。Use entities as parameters that make that action possible.

为预订航班的机器人创建一个“BookFlight”意向。For a bot that will book airline flights, create a BookFlight intent. 请勿为每条航线或每个目的地都创建一个意向。Do not create an intent for every airline or every destination. 将这些数据用作实体,并在话语示例中进行标记。Use those pieces of data as entities and mark them in the example utterances.

请勿使用所有可能的值创建短语列表Don't create phrase lists with all the possible values

短语列表中提供一些示例,但无需包含所有字词或短语。Provide a few examples in the phrase lists but not every word or phrase. LUIS 会对上下文进行一般化并将其纳入考虑。LUIS generalizes and takes context into account.

请勿添加许多模式Don't add many patterns

请勿添加过多模式Don't add too many patterns. LUIS 旨在通过少量示例快速学习。LUIS is meant to learn quickly with fewer examples. 请勿在不必要的情况下重载系统。Don't overload the system unnecessarily.

请勿使用添加的每个话语示例进行训练和发布Don't train and publish with every single example utterance

在进行训练和发布之前添加 10 或 15 个话语。Add 10 or 15 utterances before training and publishing. 这样可以了解对预测准确性的影响。That allows you to see the impact on prediction accuracy. 添加单个话语可能不会对分数产生明显影响。Adding a single utterance may not have a visible impact on the score.

后续步骤Next steps