实体及其在 LUIS 中的用途Entities and their purpose in LUIS

实体的主要用途是使客户端应用程序能够以可预测的方式提取数据。The primary purpose of entities is to give the client application predictable extraction of data. 一个可选的辅助用途是通过描述符促进意向或其他实体的预测。 An optional, secondary purpose is to boost the prediction of the intent or other entities with descriptors.

有两种类型的实体:There are two types of entities:

  • 机器学习 - 从上下文学习machine-learned - from context
  • 非机器学习 - 供预生成实体用于精确文本匹配、模式匹配或检测non-machine-learned - for exact text matches, pattern matches, or detection by prebuilt entities

机器学习实体提供最广泛的数据提取选项。Machine-learned entities provide the widest range of data extraction choices. 非机器学习实体通过文本匹配来运行,可以单独使用,也可以用作机器学习实体中的约束Non-machine-learned entities work by text matching and may be used independently or as a constraint on a machine-learned entity.

实体表示数据Entities represent data

实体是要从言语中提取的数据,例如姓名、日期、产品名称或任何有意义的单词组。Entities are data you want to pull from the utterance, such as names, dates, product names, or any significant group of words. 话语可包括多个实体,也可不包含任何实体。An utterance can include many entities or none at all. 客户端应用程序可能需要数据来执行其任务。 A client application may need the data to perform its task.

对于模型中的每个意向,需要在所有训练言语中一致性地标记实体。Entities need to be labeled consistently across all training utterances for each intent in a model.

你可以定义自己的实体,也可以使用预生成实体来节省处理 datetimeV2序号电子邮件电话号码等为常见概念的时间。You can define your own entities or use prebuilt entities to save time for common concepts such as datetimeV2, ordinal, email, and phone number.

话语Utterance 实体Entity 数据Data
购买 3 张到纽约的机票Buy 3 tickets to New York 预生成的数字Prebuilt number
Location.DestinationLocation.Destination
33
纽约New York
购买 3 月 5 日从纽约到伦敦的机票Buy a ticket from New York to London on March 5 Location.OriginLocation.Origin
Location.DestinationLocation.Destination
预生成的 datetimeV2Prebuilt datetimeV2
纽约New York
伦敦London
2018 年 3 月 5 日March 5, 2018

实体是可选的Entities are optional

意向是必需的,而实体是可选的。While intents are required, entities are optional. 无需为应用中的每个概念创建实体,只需为客户端应用程序操作时所需的概念创建实体。You do not need to create entities for every concept in your app, but only for those required for the client application to take action.

如果言语不包含客户端应用程序所需的数据,则无需添加实体。If your utterances do not have data the client application requires, you do not need to add entities. 以后随着应用程序的开发以及确定新的数据需求,可以在 LUIS 模型中添加相应的实体。As your application develops and a new need for data is identified, you can add appropriate entities to your LUIS model later.

实体与意向Entity compared to intent

实体表示要提取的言语中的数据概念。The entity represents a data concept inside the utterance that you want extracted.

言语可以选择性地包含实体。An utterance may optionally include entities. 相比之下,言语意向的预测是必需的,表示整个言语。 By comparison, the prediction of the intent for an utterance is required and represents the entire utterance. LUIS 要求在意向中包含示例言语。LUIS requires example utterances are contained in an intent.

考虑以下4 段言语:Consider the following 4 utterances:

话语Utterance 预测的意向Intent predicted 提取的实体Entities extracted 说明Explanation
帮助Help helphelp - 没有要提取的内容。Nothing to extract.
Send somethingSend something sendSomethingsendSomething - 没有要提取的内容。Nothing to extract. 模型尚未训练,不会在此上下文中提取 something,并且没有任何接收方。The model has not been trained to extract something in this context, and there is no recipient either.
Send Bob a presentSend Bob a present sendSomethingsendSomething Bob, presentBob, present 已使用提取了名字 BobpersonName 预生成实体训练模型。The model has been trained with the personName prebuilt entity, which has extracted the name Bob. 已使用机器学习实体提取 presentA machine-learned entity has been used to extract present.
Send Bob a box of chocolatesSend Bob a box of chocolates sendSomethingsendSomething Bob, box of chocolatesBob, box of chocolates 实体已提取两个重要数据片段 Bobbox of chocolatesThe two important pieces of data, Bob and the box of chocolates, have been extracted by entities.

设计分解实体Design entities for decomposition

良好的实体设计是将机器学习实体用作顶级实体。It is good entity design to make your top-level entity a machine-learned entity. 这样,就可以不断地对实体设计进行更改,并选择性地结合约束描述符使用子组件(子实体),将顶级实体分解成客户端应用程序所需的部件。This allows for changes to your entity design over time and the use of subcomponents (child entities), optionally with constraints and descriptors, to decompose the top-level entity into the parts needed by the client application.

分解设计可使 LUIS 向客户端应用程序返回深度的实体解析。Designing for decomposition allows LUIS to return a deep degree of entity resolution to your client application. 这样,客户端应用程序便可以专注于业务规则,让 LUIS 来处理数据解析。This allows your client application to focus on business rules and leave data resolution to LUIS.

机器学习实体是主要数据集合Machine-learned entities are primary data collections

机器学习实体是顶级数据单位。Machine-learned entities are the top-level data unit. 子组件是机器学习实体的子实体。Subcomponents are child entities of machine-learned entities.

机器学习实体根据通过训练言语习得的上下文触发。A machine-learned entity triggers based on the context learned through training utterances. 约束是应用于机器学习实体的可选规则,这些规则根据 List 或 Regex 等非机器学习实体的精确文本匹配定义进一步约束触发。Constraints are optional rules applied to a machine-learned entity that further constrains triggering based on the exact-text matching definition of a non-machine-learned entity such as a List or Regex. 例如,size 机器学习实体可以包含 sizeList 列表实体的约束,用于将 sizeList 实体约束为仅当遇到包含在 size 实体中的值时才触发。For example, a size machine-learned entity can have a constraint of a sizeList list entity that constrains the size entity to trigger only when values contained within the sizeList entity are encountered.

在 LUIS 应用中创建短语列表特征时,默认会全局启用该特征,并均匀地在所有意向和实体之间应用该特征。When you create a phrase list feature in your LUIS app, it is enabled globally by default and applies evenly across all intents and entities. 但是,如果将短语列表作为机器学习实体(或模型)的描述符(特征)应用,则其应用范围将缩减为仅限该模型,而不再用于其他所有模型。 However, if you apply the phrase list as a descriptor (feature) of a machine-learned entity (or model), then its scope reduces to apply only to that model and is no longer used with all the other models. 使用短语列表作为模型的描述符有助于分解,因为这对它应用到的模型的准确度有帮助。Using a phrase list as a descriptor to a model helps decomposition by assisting with the accuracy for the model it is applied to.

实体类型Types of entities

请根据数据的提取方式以及提取后的数据表示方式,来选择实体。Choose the entity based on how the data should be extracted and how it should be represented after it is extracted.

实体类型Entity type 目的Purpose
机器学习Machine-learned 机器学习实体从话语的上下文学习。Machine-learned entities learn from context in the utterance. 实体的父分组,不考虑实体类型。Parent grouping of entities, regardless of entity type. 这使得示例话语中的位置差异变得显著。This makes variation of placement in example utterances significant.
列表List 使用精确文本匹配提取的项列表及其同义词。List of items and their synonyms extracted with exact text match.
Pattern.anyPattern.any 难以确定末尾部分的实体。Entity where end of entity is difficult to determine.
预生成Prebuilt 已经过训练,可以提取特定类型的数据,例如 URL 或电子邮件。Already trained to extract specific kind of data such as URL or email. 其中一些预生成实体是在开源识别器 - 文本项目中定义的。Some of these prebuilt entities are defined in the open-source Recognizers-Text project. 如果你的特定区域性或实体当前不受支持,请通过为项目做贡献来获得支持。If your specific culture or entity isn't currently supported, contribute to the project.
正则表达式Regular Expression 使用正则表达式进行精确文本匹配Uses regular expression for exact text match.

言语可以包含两个或更多个实体,其中的数据含义基于言语内部的上下文。An utterance may contain two or more occurrences of an entity where the meaning of the data is based on context within the utterance. 例如,预订航班的言语包含两个位置:出发地和目的地。An example is an utterance for booking a flight that has two locations, origin and destination.

Book a flight from Seattle to Cairo

需要提取 location 实体的两个示例。The two examples of a location entity need to be extracted. 客户端应用程序需要知道每个实体的位置类型才能完成购票过程。The client-application needs to know the type of location for each in order to complete the ticket purchase.

可通过两种方法提取上下文相关的数据:There are two techniques for extracting contextually-related data:

  • location 实体是机器学习实体,它使用两个子组件实体来捕获 origindestination(首选)The location entity is a machine-learned entity and uses two subcomponent entities to capture the origin and destination (preferred)
  • location 实体使用 origindestination 这两个角色The location entity uses two roles of origin and destination

多个实体可以在一个言语中存在,如果使用这些实体的上下文没有意义,则无需使用分解或角色即可提取这些实体。Multiple entities can exist in an utterance and can be extracted without using decomposition or roles if the context in which they are used has no significance. 例如,如果言语 I want to travel to Seattle, Cairo, and London. 包含位置列表,该列表中的每个项没有其他含义。For example, if the utterance includes a list of locations, I want to travel to Seattle, Cairo, and London., this is a list where each item doesn't have an additional meaning.

使用机器学习实体的子组件实体来定义上下文Using subcomponent entities of a machine-learned entity to define context

可以使用机器学习实体提取描述预订航班操作的数据,然后将顶级实体分解成客户端应用程序所需的单独部件。You can use a machine-learned entity to extract the data that describes the action of booking a flight and then to decompose the top-level entity into the separate parts needed by the client application.

在此示例中,顶级实体 Book a flight from Seattle to Cairo 可以是 travelAction,标记为提取 flight from Seattle to CairoIn this example, Book a flight from Seattle to Cairo, the top-level entity could be travelAction and labeled to extract flight from Seattle to Cairo. 然后,创建名为 origindestination 的两个子组件实体,两者已应用 geographyV2 预生成实体的约束。Then two subcomponent entities are created, called origin and destination, both with a constraint applied of the prebuilt geographyV2 entity. 在训练言语中,相应地对 origindestination 进行标记。In the training utterances, the origin and destination are labeled appropriately.

使用实体角色来定义上下文Using Entity role to define context

角色是实体的命名别名,基于言语内部的上下文。A Role is a named alias for an entity based on context within the utterance. 角色可与任何预生成的或自定义的实体类型配合使用,并可在示例言语和模式中使用。A role can be used with any prebuilt or custom entity type, and used in both example utterances and patterns. 在此示例中,location 实体需要 origindestination 这两个角色,并且需要在示例言语中标记这两个角色。In this example, the location entity needs two roles of origin and destination and both need to be marked in the example utterances.

如果 LUIS 找到 location 但无法确定角色,则仍会返回位置实体。If LUIS finds the location but can't determine the role, the location entity is still returned. 客户端应用程序需要跟进问题,以确定用户所指的位置类型。The client application would need to follow up with a question to determine which type of location the user meant.

如果所需实体数超过最大实体数If you need more than the maximum number of entities

如果需要提高限制,请联系支持人员。If you need more than the limit, contact support. 为此,请收集有关系统的详细信息,转到 LUIS 网站,然后选择“支持” 。To do so, gather detailed information about your system, go to the LUIS website, and then select Support. 如果所持 Azure 订阅包含支持服务,请与 Azure 技术支持联系。If your Azure subscription includes support services, contact Azure technical support.

实体预测状态Entity prediction status

当示例言语中的实体的实体预测不同于所选实体时,LUIS 门户会显示此状态。The LUIS portal shows when the entity, in an example utterance, has a different entity prediction than the entity you selected. 这种不同的评分是根据当前已训练的模型给出的。This different score is based on the current trained model.

后续步骤Next steps

了解关于优良话语的概念。Learn concepts about good utterances.