有关生成语言理解 (LUIS) 应用的最佳做法Best practices for building a language understanding (LUIS) app

使用应用创作过程生成 LUIS 应用:Use the app authoring process to build your LUIS app:

  • 生成语言模型(意向和实体)Build language models (intents and entities)
  • 添加几个训练言语示例(每个意向 15 到 30 个)Add a few training example utterances (15-30 per intent)
  • 发布到终结点Publish to endpoint
  • 从终结点进行测试Test from endpoint

发布应用后,使用开发生命周期添加特征、发布和从终结点进行测试。Once your app is published, use the development lifecycle to add features, publish, and test from endpoint. 下一个创作周期不要从添加更多示例言语开始,否则 LUIS 无法根据实际用户言语学习你的模型。Do not begin the next authoring cycle by adding more example utterances because that does not let LUIS learn your model with real-world user utterances.

在当前的示例言语与终结点言语集返回可信的较高预测评分之前,请不要展开言语。Do not expand the utterances until the current set of both example and endpoint utterances are returning confident, high prediction scores. 使用主动学习提高评分。Improve scores using active learning.

注意事项Do and Don't

下面的列表包含 LUIS 应用的最佳做法:The following list includes best practices for LUIS apps:

Do 不要Don't
应定义不同的意向Define distinct intents 将许多话语示例添加到意向Add many example utterances to intents
每个意向需采用合适的详细程度Find a sweet spot between too generic and too specific for each intent 将 LUIS 用作培训平台Use LUIS as a training platform
以迭代方式生成应用Build your app iteratively 添加许多相同格式的话语示例,忽略其他格式Add many example utterances of the same format, ignoring other formats
在后续的迭代中添加短语列表和模式Add phrase lists and patterns in later iterations 混淆意向和实体的定义Mix the definition of intents and entities
跨所有意向来平衡话语,None 意向除外。Balance your utterances across all intents except the None intent.
将话语示例添加到“None”意向Add example utterances to None intent
创建包含所有可能值的短语列表Create phrase lists with all possible value
利用主动学习的建议功能Leverage the suggest feature for active learning 添加的模式过多Add too many patterns
监视应用的性能Monitor the performance of your app 使用添加的每个话语示例进行训练和发布Train and publish with every single example utterance added
将版本用于每个应用迭代Use versions for each app iteration

应定义不同的意向Do define distinct intents

确保每个意向的词汇特定于该意向,而不会与其他意向的词汇重叠。Make sure the vocabulary for each intent is just for that intent and not overlapping with a different intent. 例如,如果要创建一款处理行程安排(例如航班和酒店)的应用,可以选择将这些主题领域视作彼此独立的意向或视为同一意向,其中包含话语中特定数据的实体。For example, if you want to have an app that handles travel arrangements such as airline flights and hotels, you can choose to have these subject areas as separate intents or the same intent with entities for specific data inside the utterance.

如果两个意向的词汇相同,请合并意向并使用实体。If the vocabulary between two intents is the same, combine the intent, and use entities.

请考虑以下话语示例:Consider the following example utterances:

示例陈述Example utterances
预订航班Book a flight
预订酒店Book a hotel

Book a flightBook a hotel 使用相同的词汇 book a Book a flight and Book a hotel use the same vocabulary of book a . 此格式相同,因此它应该是同一意向,只是使用不同的词语(flighthotel)作为提取的实体。This format is the same so it should be the same intent with the different words of flight and hotel as extracted entities.

将描述符添加到意向Do add descriptors to intents

描述符帮助描述意向的特征。Descriptors help describe features for an intent. 描述符可以是对该意向非常重要的单词短语列表,也可以是对该意向非常重要的实体。A descriptor can be a phrase list of words that are significant to that intent or an entity that is significant to that intent.

请找到意向的平衡点Do find sweet spot for intents

使用 LUIS 中的预测数据来判定意向是否存在重叠的情况。Use prediction data from LUIS to determine if your intents are overlapping. 重叠的意向会困扰 LUIS。Overlapping intents confuse LUIS. 结果是评分最高的意向会与另一个意向非常接近。The result is that the top scoring intent is too close to another intent. 由于 LUIS 不会在每次训练的数据中使用完全相同的路径,所以重叠意向可能会在训练中排到第一或第二的位置。Because LUIS does not use the exact same path through the data for training each time, an overlapping intent has a chance of being first or second in training. 各意向的话语分数应相互拉开差距以避免出现上述翻转情况。You want the utterance's score for each intention to be farther apart so this flip/flop doesn't happen. 更好地区分意向可以使得每次训练都得出预期的最高分意向。Good distinction for intents should result in the expected top intent every time.

使用版本以迭代方式生成应用Do build your app iteratively with versions

每个创作周期应该在一个新版本内进行,从现有版本进行克隆。Each authoring cycle should be within a new version, cloned from an existing version.

为模型分解生成Do build for model decomposition

模型分解的一个典型流程是:Model decomposition has a typical process of:

  • 基于客户端应用的用户意向创建意向create Intent based on client-app's user intentions
  • 基于实际用户输入添加 15-30 个示例言语add 15-30 example utterances based on real-world user input
  • 标记示例言语中的顶层数据概念label top-level data concept in example utterance
  • 将数据概念分解成子组件break data concept into subcomponents
  • 将描述符(特征)添加到子组件add descriptors (features) to subcomponents
  • 将描述符(特征)添加到意向add descriptors (features) to intent

创建意向并添加示例言语后,以下示例描述实体分解。Once you have created the intent and added example utterances, the following example describes entity decomposition.

首先标识要在言语中提取的整个数据概念。Start by identifying complete data concepts you want to extract in an utterance. 这是机器学习实体。This is your machine-learned entity. 然后将短语分解成各个组成部分。Then decompose the phrase into its parts. 这包括标识性的子组件(用作实体)以及描述符和约束。This includes identifying subcomponents (as entities), along with descriptors and constraints.

例如,若要提取某个地址,顶层机器学习实体可以命名为 AddressFor example if you want to extract an address, the top machine-learned entity could be called Address. 创建地址时,标识其某些子组件,例如街道地址、城市、州/省和邮政编码。While creating the address, identify some of its subcomponents such as street address, city, state, and postal code.

通过将邮政编码约束为正则表达式来继续分解这些元素。Continue decomposing those elements by constraining the postal code to a regular expression. 将街道地址分解成街道编号(使用预生成的编号)、街道名称和街道类型组成部分。Decompose the street address into parts of a street number (using a prebuilt number), a street name, and a street type. 可以使用“大道”、“环”、“路”和“巷”等描述符列表来描述街道类型。The street type can be described with a descriptor list such as avenue, circle, road, and lane.

可以使用 V3 创作 API 进行模型分解。The V3 authoring API allows for model decomposition.

在后续的迭代中添加模式Do add patterns in later iterations

在添加模式之前,应该了解应用的行为方式,因为模式比示例言语的权重更大,会影响置信度。You should understand how the app behaves before adding patterns because patterns are weighted more heavily than example utterances and will skew confidence.

了解应用的行为方式后,添加要应用于应用的模式。Once you understand how your app behaves, add patterns as they apply to your app. 不需要在每个迭代中添加模式。You do not need to add them with each iteration.

在设计模型之初就添加它们并没有什么坏处,但如果在使用言语测试模型之后再添加,则可以更容易地看出每个模式如何改变模型。There is no harm adding them in the beginning of your model design but it is easier to see how each pattern changes the model after the model is tested with utterances.

跨所有意向平衡言语Do balance your utterances across all intents

为了使 LUIS 预测准确,每个意向(None 意向除外)中示例话语的数量必须相同(相对说来)。In order for LUIS predictions to be accurate, the quantity of example utterances in each intent (except for the None intent), must be relatively equal.

如果一个意向有 100 个示例话语,另一个意向有 20 个示例话语,则 100 个话语的意向的预测准确率会更高。If you have an intent with 100 example utterances and an intent with 20 example utterances, the 100-utterance intent will have a higher rate of prediction.

务必将话语示例添加至“None”意向Do add example utterances to None intent

此意向是回退意向,指示应用程序以外的所有内容。This intent is the fallback intent, indicating everything outside your application. 针对 LUIS 应用其余部分的每 10 个话语示例,向“None”意向中添加一个话语示例。Add one example utterance to the None intent for every 10 example utterances in the rest of your LUIS app.

应利用主动学习的建议功能Do leverage the suggest feature for active learning

定期使用主动学习的“查看终结点言语”功能,而不是将更多示例言语添加到意向 。Use active learning's Review endpoint utterances on a regular basis, instead of adding more example utterances to intents. 因为应用会不断接收终结点话语,所以此列表会不断变化。Because the app is constantly receiving endpoint utterances, this list is growing and changing.

应监视应用的性能Do monitor the performance of your app

使用批量测试集监视预测准确性。Monitor the prediction accuracy using a batch test set.

保留一个独立的言语集,不将其用作示例言语或终结点言语。Keep a separate set of utterances that aren't used as example utterances or endpoint utterances. 针对测试集不断改进应用。Keep improving the app for your test set. 调整测试集以反映真实的用户话语。Adapt the test set to reflect real user utterances. 使用此测试集来评估每次迭代的或每个版本的应用。Use this test set to evaluate each iteration or version of the app.

请勿将许多话语示例添加到意向Don't add many example utterances to intents

发布应用后,仅在开发生命周期过程中添加主动学习中的言语。After the app is published, only add utterances from active learning in the development lifecycle process. 如果话语太过相似,请添加模式。If utterances are too similar, add a pattern.

不要使用少量或简单的实体Don't use few or simple entities

实体是为数据提取和预测而生成的。Entities are built for data extraction and prediction. 重要的一点是,每个意向都包含机器学习实体,这些实体描述意向中的数据。It is important that each intent have machine-learned entities that describe the data in the intent. 这可以帮助 LUIS 预测意向,即使客户端应用程序不需要使用提取的实体。This helps LUIS predict the intent, even if your client application doesn't need to use the extracted entity.

请勿将 LUIS 用作培训平台Don't use LUIS as a training platform

LUIS 特定于语言模型的域。LUIS is specific to a language model's domain. 但并不意味着将其用作常规自然语言训练平台。It isn't meant to work as a general natural language training platform.

请勿添加许多相同格式的话语示例,而忽略其他格式Don't add many example utterances of the same format, ignoring other formats

LUIS 会预期一个意向的话语会存在变体。LUIS expects variations in an intent's utterances. 在总体意思相同的情况下,话语形式可能会有所不同。The utterances can vary while having the same overall meaning. 其差异可能涉及话语长度、字词选择和字词位置等方面。Variations can include utterance length, word choice, and word placement.

请勿使用相同的格式Don't use same format 务必使用不同的格式Do use varying format
购买一张到西雅图的票Buy a ticket to Seattle
购买一张到巴黎的票Buy a ticket to Paris
购买一张到奥兰多的票Buy a ticket to Orlando
购买 1 张到西雅图的票Buy 1 ticket to Seattle
预定下周一到巴黎的夜间定期航班的两个座位Reserve two seats on the red eye to Paris next Monday
我要预订 3 张到奥兰多的票,去度春假I would like to book 3 tickets to Orlando for spring break

第二列使用了不同的动词(购买、预订、预定)、不同的数量(一、两、3)和不同的字词排序,但表达的是相同的意向,就是购买旅行的机票。The second column uses different verbs (buy, reserve, book), different quantities (1, two, 3), and different arrangements of words but all have the same intention of purchasing airline tickets for travel.

请勿混淆意向和实体的定义Don't mix the definition of intents and entities

为聊天机器人将执行的任何操作创建一个意向。Create an intent for any action your bot will take. 将实体用作实现操作的参数。Use entities as parameters that make that action possible.

为预订航班的机器人创建一个“BookFlight”意向 。For a bot that will book airline flights, create a BookFlight intent. 请勿为每条航线或每个目的地都创建一个意向。Do not create an intent for every airline or every destination. 将这些数据用作实体,并在话语示例中进行标记。Use those pieces of data as entities and mark them in the example utterances.

请勿添加许多模式Don't add many patterns

请勿添加过多模式Don't add too many patterns. LUIS 旨在通过少量示例快速学习。LUIS is meant to learn quickly with fewer examples. 请勿在不必要的情况下重载系统。Don't overload the system unnecessarily.

请勿使用添加的每个话语示例进行训练和发布Don't train and publish with every single example utterance

在进行训练和发布之前添加 10 或 15 个话语。Add 10 or 15 utterances before training and publishing. 这样可以了解对预测准确性的影响。That allows you to see the impact on prediction accuracy. 添加单个话语可能不会对分数产生明显影响。Adding a single utterance may not have a visible impact on the score.

后续步骤Next steps