从包含意向和实体的话语文本中提取数据Extract data from utterance text with intents and entities

使用 LUIS 可以从用户的自然语言陈述中获取信息。LUIS gives you the ability to get information from a user's natural language utterances. 信息以一种程序、应用程序或聊天机器人能够使用其来采取操作的方式进行提取。The information is extracted in a way that it can be used by a program, application, or chat bot to take action. 在以下部分中,通过 JSON 示例了解从意向和实体返回了什么数据。In the following sections, learn what data is returned from intents and entities with examples of JSON.

最难提取的数据是机器学习的数据,因为它不是完全匹配的文本。The hardest data to extract is the machine-learning data because it isn't an exact text match. 机器学习实体的数据提取需要作为创作周期的一部分,直到你确信已接收到所需的数据。Data extraction of the machine-learning entities needs to be part of the authoring cycle until you're confident you receive the data you expect.

数据位置和密钥用法Data location and key usage

LUIS 从已发布终结点中的用户言语提取数据。LUIS extracts data from the user's utterance at the published endpoint. HTTPS 请求(POST 或 GET)包含陈述以及一些可选配置,例如暂存或生产环境 。The HTTPS request (POST or GET) contains the utterance as well as some optional configurations such as staging or production environments.

V2 预测终结点请求V2 prediction endpoint request

https://api.cognitive.azure.cn/luis/v2.0/apps/<appID>?subscription-key=<subscription-key>&verbose=true&timezoneOffset=0&q=book 2 tickets to paris

V3 预测终结点请求V3 prediction endpoint request

https://api.cognitive.azure.cn/luis/v3.0-preview/apps/<appID>/slots/<slot-type>/predict?subscription-key=<subscription-key>&verbose=true&timezoneOffset=0&query=book 2 tickets to paris

在编辑 LUIS 应用时,appID 可在该 LUIS 应用的“设置”页上找到,也可在 URL 中找到(在 /apps/ 之后)。The appID is available on the Settings page of your LUIS app as well as part of the URL (after /apps/) when you're editing that LUIS app. subscription-key 是用于查询应用的终结点密钥。The subscription-key is the endpoint key used for querying your app. 虽然可以在学习 LUIS 时使用免费的创作/初学者密钥,但是将终结点密钥更改为支持所需 LUIS 用法的密钥非常重要。While you can use your free authoring/starter key while you're learning LUIS, it is important to change the endpoint key to a key that supports your expected LUIS usage. timezoneOffset 的单位是分钟。The timezoneOffset unit is minutes.

HTTPS 响应包含 LUIS 可基于当前发布的暂存或生产终结点的模型确定的所有意向和实体信息。The HTTPS response contains all the intent and entity information LUIS can determine based on the current published model of either the staging or production endpoint. 终结点 URL 位于 LUIS 网站的“管理”部分的“密钥和终结点”页上。The endpoint URL is found on the LUIS website, in the Manage section, on the Keys and endpoints page.

意向中的数据Data from intents

主数据是评分最高的意向名称。The primary data is the top scoring intent name. 终结点响应为:The endpoint response is:

{
  "query": "when do you open next?",
  "topScoringIntent": {
    "intent": "GetStoreInfo",
    "score": 0.984749258
  },
  "entities": []
}
数据对象Data Object 数据类型Data Type 数据位置Data Location “值”Value
IntentIntent 字符串String topScoringIntent.intenttopScoringIntent.intent "GetStoreInfo""GetStoreInfo"

如果聊天机器人或 LUIS 调用应用基于不止一个意向评分来进行决策,则返回所有意向的评分。If your chatbot or LUIS-calling app makes a decision based on more than one intent score, return all the intents' scores.

设置 querystring 参数 verbose=trueSet the querystring parameter, verbose=true. 终结点响应为:The endpoint response is:

{
  "query": "when do you open next?",
  "topScoringIntent": {
    "intent": "GetStoreInfo",
    "score": 0.984749258
  },
  "intents": [
    {
      "intent": "GetStoreInfo",
      "score": 0.984749258
    },
    {
      "intent": "None",
      "score": 0.2040639
    }
  ],
  "entities": []
}

意向按评分从高到低排序。The intents are ordered from highest to lowest score.

数据对象Data Object 数据类型Data Type 数据位置Data Location “值”Value ScoreScore
IntentIntent 字符串String intents[0].intentintents[0].intent "GetStoreInfo""GetStoreInfo" 0.9847492580.984749258
IntentIntent 字符串String intents[1].intentintents[1].intent "None""None" 0.01682188730.0168218873

如果添加预构建的域,则意向名称指示该域,例如 UtiltiesCommunication 以及意向:If you add prebuilt domains, the intent name indicates the domain, such as Utilties or Communication as well as the intent:

{
  "query": "Turn on the lights next monday at 9am",
  "topScoringIntent": {
    "intent": "Utilities.ShowNext",
    "score": 0.07842206
  },
  "intents": [
    {
      "intent": "Utilities.ShowNext",
      "score": 0.07842206
    },
    {
      "intent": "Communication.StartOver",
      "score": 0.0239675418
    },
    {
      "intent": "None",
      "score": 0.0168218873
    }],
  "entities": []
}
DomainDomain 数据对象Data Object 数据类型Data Type 数据位置Data Location “值”Value
实用程序Utilities IntentIntent 字符串String intents[0].intentintents[0].intent "Utilities.ShowNext""Utilities.ShowNext"
通信Communication IntentIntent 字符串String intents[1].intentintents[1].intent Communication.StartOver"Communication.StartOver"
IntentIntent 字符串String intents[2].intentintents[2].intent "None""None"

实体中的数据Data from entities

大多数聊天机器人和应用程序需要的都不止是意向名称。Most chat bots and applications need more than the intent name. 此额外的可选数据来源于在陈述中发现的实体。This additional, optional data comes from entities discovered in the utterance. 每种类型的实体返回有关匹配项的不同信息。Each type of entity returns different information about the match.

陈述中的单个单词或短语可以匹配多个实体。A single word or phrase in an utterance can match more than one entity. 在这种情况下,会返回每个匹配实体及其评分。In that case, each matching entity is returned with its score.

所有实体均从终结点响应的“实体”数组中返回All entities are returned in the entities array of the response from the endpoint

返回的切分后的实体Tokenized entity returned

查看 LUIS 中的令牌支持Review the token support in LUIS.

预构建实体数据Prebuilt entity data

预构建实体是基于正则表达式匹配项、使用开源 Recognizers-Text 项目发现的。Prebuilt entities are discovered based on a regular expression match using the open-source Recognizers-Text project. 预构建实体返回在实体数组中,并使用前缀为 builtin:: 的类型名称。Prebuilt entities are returned in the entities array and use the type name prefixed with builtin::.

列表实体数据List entity data

列表实体表示一组固定、封闭的相关单词及其同义词。List entities represent a fixed, closed set of related words along with their synonyms. LUIS 不会为列表实体发现更多值。LUIS does not discover additional values for list entities. 使用“建议”功能根据当前列表查看有关新词的建议。Use the Recommend feature to see suggestions for new words based on the current list. 如果存在多个具有相同值的列表实体,则终结点查询中会返回其中每个实体。If there is more than one list entity with the same value, each entity is returned in the endpoint query.

正则表达式实体数据Regular expression entity data

正则表达式实体基于所提供的正则表达式提取实体。A regular expression entity extracts an entity based on a regular expression you provide.

提取名称Extracting names

从陈述提取名称非常困难,因为名称几乎可以是字母和单词的任何组合。Getting names from an utterance is difficult because a name can be almost any combination of letters and words. 根据要提取的名称类型,有若干选项。Depending on what type of name you're extracting, you have several options. 以下建议不是规则,而是更多准则。The following suggestions are not rules but more guidelines.

添加预构建的 PersonName 和 GeographyV2 实体Add prebuilt PersonName and GeographyV2 entities

PersonNameGeographyV2 实体在某些语言区域性中可用。PersonName and GeographyV2 entities are available in some language cultures.

人的姓名Names of people

人的姓名可能会带有些许格式,具体取决于语言和区域性。People's name can have some slight format depending on language and culture. 将预生成的 personName 实体或 简单实体 与包含姓和名的角色配合使用。Use either a prebuilt personName entity or a simple entity with roles of first and last name.

如果使用简单实体,请确保给出的示例在话语的不同部分、在不同长度的话语中以及在所有意向(包括“None”意向)的话语中使用姓氏和名字。If you use the simple entity, make sure to give examples that use the first and last name in different parts of the utterance, in utterances of different lengths, and utterances across all intents including the None intent. 定期查看终结点陈述以标记未能正确预测的任何名称。Review endpoint utterances on a regular basis to label any names that were not predicted correctly.

地名Names of places

地名是固定且已知的,例如市、县、州、省和国家/地区。Location names are set and known such as cities, counties, states, provinces, and countries/regions. 使用预生成的实体 geographyV2 提取位置信息。Use the prebuilt entity geographyV2 to extract location information.

新出现的名称New and emerging names

一些应用需要能够找到新出现的名称,例如产品或公司。Some apps need to be able to find new and emerging names such as products or companies. 这些类型的名称是最难提取的数据类型。These types of names are the most difficult type of data extraction. 首先从简单实体开始,添加一个短语列表Begin with a simple entity and add a phrase list. 定期查看终结点陈述以标记未能正确预测的任何名称。Review endpoint utterances on a regular basis to label any names that were not predicted correctly.

Pattern.any 实体数据Pattern.any entity data

Pattern.any 是一种长度可变的占位符,仅在模式的模板话语中使用,用于标记实体的起始和结束位置。Pattern.any is a variable-length placeholder used only in a pattern's template utterance to mark where the entity begins and ends. 若要应用模式,必须找到模式中使用的实体。The entity used in the pattern must be found in order for the pattern to be applied.

情绪分析Sentiment analysis

如果在发布时配置了情绪分析,LUIS json 响应会包含情绪分析。If Sentiment analysis is configured while publishing, the LUIS json response includes sentiment analysis. 请在文本分析文档中详细了解情绪分析。Learn more about sentiment analysis in the Text Analytics documentation.

关键短语提取实体数据Key phrase extraction entity data

关键短语提取实体返回言语中的关键短语(由文本分析提供)。The key phrase extraction entity returns key phrases in the utterance, provided by Text Analytics.

匹配多个实体的数据Data matching multiple entities

LUIS 返回在陈述中发现的所有实体。LUIS returns all entities discovered in the utterance. 因此,聊天机器人可能需要基于这些结果进行决策。As a result, your chat bot may need to make a decision based on the results.

匹配多个列表实体的数据Data matching multiple list entities

如果一个单词或短语与多个列表实体匹配,则终结点查询会返回每个列表实体。If a word or phrase matches more than one list entity, the endpoint query returns each List entity.

对于查询 when is the best time to go to red rock?,如果该应用的多个列表中包含单词 red,LUIS 会识别所有实体,并返回一组实体作为 JSON 终结点响应的一部分。For the query when is the best time to go to red rock?, and the app has the word red in more than one list, LUIS recognizes all the entities and returns an array of entities as part of the JSON endpoint response.

后续步骤Next steps

请参阅添加实体,详细了解如何将实体添加到 LUIS 应用。See Add entities to learn more about how to add entities to your LUIS app.