教程:在语言理解 (LUIS) 中使用机器学习实体从用户言语中提取结构化数据Tutorial: Extract structured data from user utterance with machine-learning entities in Language Understanding (LUIS)

在本教程中,使用机器学习实体从言语中提取结构化数据。In this tutorial, extract structured data from an utterance using the machine-learning entity.

机器学习实体通过为子实体提供特性来支持模型分解概念The machine-learning entity supports the model decomposition concept by providing subentity entities with features.

本教程介绍如何执行下列操作:In this tutorial, you learn how to:

  • 导入示例应用Import example app
  • 添加机器学习实体Add machine-learning entity
  • 添加子实体和功能Add subentity and feature
  • 训练、测试、发布应用Train, Test, Publish app
  • 从终结点获取实体预测Get entity prediction from endpoint

For this article, you can use the free LUIS account and its starter key in order to author your LUIS application.

为什么使用机器学习实体?Why use a machine-learning entity?

本教程添加了机器学习实体,用于从用户言语中提取数据。This tutorial adds a machine-learning entity to extract data from a user's utterance.

该实体定义要从言语中提取的数据。The entity defines the data to extract from within the utterance. 这包括为数据提供名称、类型(如果可能)、任何数据的解析(如果有多义性)以及组成数据的确切文本。This includes giving the data a name, a type (if possible), any resolution of the data if there is ambiguity, and the exact text that makes up the data.

若要定义数据,需要:In order to define the data, you need to:

  • 创建实体Create the entity
  • 在示例言语中标记表示实体的文本。Label the text, within example utterances, representing the entity. 这些标记的示例告知 LUIS 什么是实体以及在言语中何处可以找到实体。These labeled examples teach LUIS what the entity is and where it can be found in an utterance.

实体可分解性非常重要Entity decomposability is important

实体可分解性对于使用实体进行意向预测和数据提取十分重要。Entity decomposability is important for both intent prediction and for data extraction with the entity.

从机器学习实体开始操作,它是数据提取的起始、顶层实体。Start with a machine-learning entity, which is the beginning and top-level entity for data extraction. 然后,将实体分解为子实体。Then decompose the entity into subentities.

刚开始创建应用时,你可能不知道需要多详细的实体,最佳做法是从机器学习实体开始,随着应用的成熟用子实体进行分解。While you may not know how detailed you want your entity when you begin your app, a best practice is to start with a machine-learning entity, then decompose with subentities as your app matures.

在本教程中,你将创建进行机器学习实体来表示披萨应用的订单。In this tutorial, you create a machine-learning entity to represent an order for a pizza app. 实体将提取与订单相关的文本,从而提炼出大小和数量信息。The entity will extract order-related text, pulling out size, and quantity.

言语 Please deliver one large cheese pizza to me 应提取 one large cheese pizza 作为订单,然后提取 1 的数量和 large 的大小。An utterance of Please deliver one large cheese pizza to me should extract one large cheese pizza as the order, then also extract 1 for quantity and large for size.

下载适用于应用的 JSON 文件Download JSON file for app

下载并保存应用 JSON 文件Download and save the app JSON file.

导出适用于应用的 JSON 文件Import JSON file for app

  1. LUIS 门户上的“我的应用”页上,选择“+ 新建对话应用”,然后选择“导入为 JSON”。In the LUIS portal, on the My apps page, select + New app for conversation , then Import as JSON . 查找上一步中保存的 JSON 文件。Find the saved JSON file from the previous step. 无需更改应用的名称。You don't need to change the name of the app. 选择“完成”Select Done

  2. 在“管理”部分的“版本”选项卡上选择 0.1 版本,然后选择“克隆”以克隆该版本,并为其提供一个新的名称 ml-entity,然后选择“完成”以完成克隆流程 。From the Manage section, on the Versions tab, select the 0.1 version, then select Clone to clone the version, and give it a new name of ml-entity, then select Done to finish the clone process. 由于版本名称用作 URL 路由的一部分,因此该名称不能包含任何在 URL 中无效的字符。Because the version name is used as part of the URL route, the name can't contain any characters that are not valid in a URL.

    提示

    在修改应用之前,最佳做法是克隆到新版本。Cloning into a new version is a best practice before you modify your app. 更改完版本后,请将该版本导出为 .json 或 .lu 文件,然后将文件签入源代码管理系统中。When you finish with a change to a version, export the version (as a .json or .lu file), and check the file into your source control system.

  3. 选择“生成”,然后选择“意向”,以查看意向(LUIS 应用的主要构建基块) 。Select Build then Intents to see the intents, the main building blocks of a LUIS app.

    从“版本”页切换到“意向”页。

创建机器学习实体Create machine learned entity

为了提取披萨订单的详细信息,创建顶层的机器学习 Order 实体。To extract details about a pizza order, create a top level, machine-learning Order entity.

  1. 在“意向”页上,选择“OrderPizza”意向 。On the Intents page, select the OrderPizza intent.

  2. 在示例言语列表中,选择以下言语。In the example utterances list, select the following utterance.

    订单示例言语Order example utterance
    pickup a cheddar cheese pizza large with extra anchovies

    在最左侧文本 pickup (#1) 之前开始选择,然后移到最右侧文本 anchovies(#2 - 这将结束标记过程)的上方。Begin selecting just before the left-most text of pickup (#1), then go just beyond the right-most text, anchovies (#2 - this ends the labeling process). 此时将显示一个弹出菜单。A pop-up menu appears. 在弹出框中,输入实体的名称作为 Order (#3)。In the pop-up box, enter the name of the entity as Order (#3). 然后从列表选择 Order Create new entity (#4)。Then select Order Create new entity from the list (#4).

    标记完整订单文本的开头和结尾

    备注

    实体不一定是整个言语。An entity won't always be the entire utterance. 在此特定示例中,pickup 指示如何接收订单。In this specific case, pickup indicates how the order is to be received. 从概念角度来看,pickup 应是订单的已标记实体的一部分。From a conceptual perspective, pickup should be part of the labeled entity for the order.

  3. 在“选择实体类型”框中,选择“添加结构”,然后选择“下一步”。In the Choose an entity type box, select Add Structure then select Next . 结构是添加子实体(例如大小和数量)所必需的。Structure is necessary to add subentities such as size and quantity.

    屏幕截图显示了“选择实体类型”窗口,其中已选中“添加结构”选项。

  4. 在“添加子实体(可选)”框中,选择 Order 行上的 +,将 SizeQuantity 添加为子实体,然后选择“创建” 。In the Add subentities (optional) box, select + on the Order row, then add Size and Quantity as subentities, then select Create .

    屏幕截图显示了“添加子实体(可选)”窗口,其中突出显示了子实体。Screenshot shows the Add subentities (optional) window with subentities highlighted.

编辑子实体以改进提取Edit subentities to improve extraction

上述步骤可以创建实体和子实体。The previous steps create the entity and subentity. 若要改进提取,请向子实体添加特性。To improve extraction, add features to the subentities.

改进短语列表中的尺寸提取Improve size extraction with phrase list

  1. 从左侧菜单中选择“实体”,然后选择“订单”实体 。Select Entities from the left menu, then select Order entity.

  2. 在“架构和特性”选项卡上,选择“尺寸”子实体,然后选择“+ 添加特性” 。On the Schema and features tab, select the Size subentity, then select + Add feature .

  3. 在下拉菜单中选择“新建短语列表”。Select Create new phrase list from the drop-down menu.

  4. 在“创建新短语列表”框中,输入名称 SizePhraselist,然后输入 smallmediumlarge 值。In the Create new phrase list box, enter the name SizePhraselist then enter values of: small, medium, and large. 当填充“建议”框时,选择 extra largexlWhen the Suggestions box fills in, select extra large, and xl. 选择“创建”来创建新的短语列表。Select Create to create the new phrase list.

    此短语列表功能通过提供示例字词,帮助 Size 子实体查找与大小相关的字词。This phrase list feature helps the Size subentity find words related to size by providing it with example words. 此短语列表不需要包含每个尺寸词,而应当包含预期指示尺寸的字词。This phrase list doesn't need to include every size word but should include words that are expected to indicate size.

添加 SizeList 实体Add SizeList entity

添加客户端应用程序可以识别的已知尺寸列表也将有助于提取。Adding a list of known sizes the client application recognizes will also help extraction.

  1. 从左侧菜单中选择“实体”,然后选择“+ 创建” 。Select Entities from the left menu then select + Create .

  2. 将实体名称设置为 SizeListentity,并将“类型”设置为“列表”,以便在与前一个部分中创建的 SizePhraselist 进行比较时轻松识别它。Set the entity name as SizeListentity and set the Type as List so it is easy to identify when compared to the SizePhraselist created in the previous section.

  3. 添加客户端应用程序预期的尺寸:SmallMediumLargeXLarge,然后为每个尺寸添加同义词。Add the sizes the client application expects: Small, Medium, Large, and XLarge then add synonyms for each. 同义词应该是用户在聊天机器人中输入的词语。The synonyms should be the terms that a user enters in the chat bot. 实体与规范化值或同义词完全匹配时,将提取出该实体以及列表实体。The entity is extracted with a list entity when it matched exactly to the normalized value or synonyms.

    规范化值Normalized value 同义词Synonyms
    小型Small sm、sml、tiny、smallestsm, sml, tiny, smallest
    中型Medium md、mdm、regular、average、middlemd, mdm, regular, average, middle
    大型Large lg、lrg、biglg, lrg, big
    XLargeXLarge xl、biggest、giantxl, biggest, giant

    屏幕截图显示了 SizeList 窗口以及选择了“XLarge”的列表项。Screenshot shows the SizeList window and List items with XLarge selected.

添加 SizeList 实体的特性Add feature of SizeList entity

  1. 从左侧菜单中选择“实体”,返回到实体列表。Select Entities from left menu to return to the list of entities.

  2. 从实体列表中选择“订单”。Select Order from the list of entities.

  3. 在“架构和特性”选项卡上,选择“大小”实体,然后选择“+ 添加特性” 。On the Schema and features tab, select the Size entity, then select + Add feature .

  4. 从下拉列表中选择“@ SizeListentity”。Select @ SizeListentity from the drop-down list.

添加预生成数字实体Add prebuilt number entity

添加预生成的数字实体也将有助于提取。Adding a prebuilt number entity will also help extraction.

  1. 从左侧菜单中选择“实体”,然后选择“添加预生成实体” 。Select Entities from the left menu then select Add prebuilt entity .

  2. 从列表中选择“数字”,然后选择“完成” 。Select Number from the list then select Done .

  3. 从左侧菜单中选择“实体”,返回到实体列表。Select Entities from left menu to return to the list of entities.

为预生成的数字实体添加特性Add feature of prebuilt number entity

  1. 从实体列表中选择“订单”。Select Order from the list of entities.

  2. 在“架构和特性”选项卡上,选择“数量”实体,然后选择“+ 添加特性” 。On the Schema and features tab, select the Quantity entity, then select + Add feature .

  3. 从下拉列表中选择“@ number”。Select @ number from the drop-down list.

配置所需特性Configure required features

在“订单”实体的实体详细信息页上,选择星号 *,其表示“@ SizeList”特性和“@ number”特性 。On the Entity detail page for Order entity, select the asterisk, *, for both the @ SizeList feature and the @ number feature. 星号显示在与特性名称相同的标签中。The asterisk appears in the same label as the feature name.

屏幕截图显示了带有星号的 @SizeList 功能和“需要”警告。Screenshot shows the @SizeList feature with the asterisk and Require warning.

标签示例言语Label example utterances

创建了机器学习实体,且子实体具有特性。The machine learned entity is created and the subentities have features. 为完成对提取的改进,需要使用子实体标记示例言语。To complete the extraction improvement, the example utterances need to be labeled with the subentities.

  1. 在左侧导航栏中选择“意向”,然后选择“OrderPizza”意向 。Select Intents from the left navigation, then select the OrderPizza intent.

  2. 若要打开“实体调色板”,请在上下文工具栏中选择 @ 符号 。To open the Entity Palette , selecting the @ symbol in the contextual toolbar.

  3. 选择调色板中的每个实体行,然后使用调色板光标选择每个示例言语中的实体。Select each entity row in the palette, then use the palette cursor to select the entity in each example utterance. 完成操作后,实体列表应如下图所示。When you are finished, the entity list should look like the following image.

    配置所需特性的部分屏幕截图Partial screenshot of configuring required feature

训练应用Train the app

若要训练应用,请选择“训练”。To train the app, select Train . 训练会将更改(如新实体和已标记的言语)应用于活动模型。Training applies the changes, such as the new entities and the labeled utterances, to the active model.

添加新的示例言语Add a new example utterance

  1. 进行训练后,添加新的示例言语到 OrderPizza 意向,以查看 LUIS 对机器习得实体的理解程度。After training, add a new example utterance to the OrderPizza intent to see how well LUIS understands the machine-learning entity.

    订单示例言语Order example utterance
    I need a large pepperoni pizza

    整个顶部实体 Order 已进行标记,Size 子实体也标有虚线。The overall top entity, Order is labeled and the Size subentity is also labeled with dotted lines.

    通过实体预测的新示例言语的部分屏幕截图Partial screenshot of new example utterance predicted with entity

    虚线表示基于当前训练后应用的预测。The dotted line indicates the prediction based on the current trained app.

  2. 若要将预测更改为标记实体,请选择同一行中的复选标记。To change the prediction into a labeled entity, select the check mark on the same row.

    屏幕截图显示了一个突出显示勾选标记的示例话语。Screenshot shows an example utterance with the check mark highlighted.

    此时,机器学习实体有效,因为它可以在新的示例言语中找到该实体。At this point, the machine-learning entity is working because it can find the entity within a new example utterance. 添加示例言语时,如果未正确预测实体,请标记实体和子实体。As you add example utterances, if the entity is not predicted correctly, label the entity and the subentities. 如果正确预测了实体,请确保确认预测。If the entity is predicted correctly, make sure to confirm the predictions.

训练应用以将实体更改应用于应用Train the app to apply the entity changes to the app

选择“训练”以使用这一新言语训练应用。Select Train to train the app with this new utterance.

此时,订单有一些可提取的详细信息(大小、数量和订单总计文本)。At this point, the order has some details that can be extracted (size, quantity, and total order text). 还可以进一步优化 Order 实体,如披萨浇汁、酥皮类型和配菜订单。There is further refining of the Order entity such as pizza toppings, type of crust, and side orders. 其中每一项应作为 Order 实体的子实体创建。Each of those should be created as subentities of the Order entity.

测试应用以验证更改Test the app to validate the changes

使用交互式“测试”面板测试应用。Test the app using the interactive Test panel. 在此过程中,可以输入新的言语,然后查看预测结果,以查看活动的和经过训练的应用的工作情况。This process lets you enter a new utterance then view the prediction results to see how well the active, trained app is working. 意向预测应相当有把握(超过 60%),实体提取应至少选取 Order 实体。The intent prediction should be fairly confident (above 60%) and the entity extraction should pick up at least the Order entity. 订单实体的详细信息可能缺失,因为这几个言语不足以处理所有情况。The details of the order entity may be missing because these few utterances aren't enough to handle every case.

  1. 选择顶部导航栏中的“测试”。Select Test in the top navigation.

  2. 输入言语 2 small cheese pizzas for pickup,并选择 Enter。Enter the utterance 2 small cheese pizzas for pickup and select Enter. 活动模型通过超过 60% 的置信度预测正确意向。The active model predicted the correct intent with over 60% confidence.

  3. 选择“检查”以查看实体预测。Select Inspect to see the entity predictions.

    在交互式测试面板中查看实体预测的部分屏幕截图。Partial screenshot of view the entity predictions in the interactive test panel.

发布应用以从 HTTP 终结点访问它Publish the app to access it from the HTTP endpoint

若要在聊天机器人或其他客户端应用程序中接收 LUIS 预测,需要将应用发布到终结点。In order to receive a LUIS prediction in a chat bot or other client application, you need to publish the app to the endpoint.

  1. 在右上方的导航栏中选择“发布”。Select Publish in the top-right navigation.

    右上方菜单中的“LUIS 发布到终结点”按钮的屏幕截图

  2. 依次选择“生产”槽、“更改设置”、“情绪分析”、“完成” 。Select the Production slot, then select Change settings , select Sentiment Analysis , then select Done .

    “LUIS 发布到终结点”的屏幕截图Screenshot of LUIS publish to endpoint

  3. 在通知中选择“访问终结点 URL”链接,转到“Azure 资源”页。 Select the Access your endpoint URLs link in the notification to go to the Azure Resources page. 终结点 URL 作为“示例查询”列出。The endpoint URLs are listed as the Example Query .

从 HTTP 终结点获取意向和实体预测结果Get intent and entity prediction from HTTP endpoint

  1. 在“Azure 资源”页(左侧菜单)的“管理”部分(右上方菜单)中,复制“示例查询”URL,然后粘贴到新的浏览器选项卡中 。In the Manage section (top-right menu), on the Azure Resources page (left menu), copy the Example Query URL then paste into a new browser tab.

    终结点 URL 的格式如下所示,其中 APP-ID 和 KEY-ID 将替换为你自己的自定义子域、应用 ID 和终结点密钥:The endpoint URL looks like the following format, with your own custom subdomain, app ID, and endpoint key replacing APP-ID, and KEY-ID:

    https://YOUR-CUSTOM-SUBDOMAIN.api.cognitive.azure.cn/luis/prediction/v3.0/apps/APP-ID/slots/production/predict?subscription-key=KEY-ID&verbose=true&show-all-intents=true&log=true&query=YOUR_QUERY_HERE
    
  2. 在地址栏中转到 URL 的末尾,将 YOUR_QUERY_HERE 替换为在交互式测试面板中输入的相同查询。Go to the end of the URL in the address bar and replace YOUR_QUERY_HERE with the same query as you entered in the interactive test panel.

    2 small cheese pizzas for pickup

    最后一个查询字符串参数为 query,表示陈述 查询The last querystring parameter is query, the utterance query .

    {
        "query": "2 small cheese pizzas for pickup",
        "prediction": {
            "topIntent": "OrderPizza",
            "intents": {
                "OrderPizza": {
                    "score": 0.7812769
                },
                "None": {
                    "score": 0.0314020254
                },
                "Confirm": {
                    "score": 0.009299271
                },
                "Greeting": {
                    "score": 0.007551549
                }
            },
            "entities": {
                "Order": [
                    {
                        "Size": [
                            "small"
                        ],
                        "Quantity": [
                            2
                        ]
                    }
                ]
            }
        },
        "sentimentAnalysis":{
            "label":"neutral",
            "score":0.98
       }
    }
    

清理资源Clean up resources

不再需要 LUIS 应用时,请将其删除。When no longer needed, delete the LUIS app. 为此,请在左上角的菜单中选择“我的应用”。To do so, select My apps from the top-left menu. 在应用列表中选择应用名称右侧的省略号 (...),然后选择“删除”。Select the ellipsis ( **..._) to the right of the app name in the app list, select _* Delete* . 在弹出的“删除应用?”对话框中,选择“确定” 。On the pop-up dialog Delete app? , select Ok .

后续步骤Next steps

在本教程中,应用使用机器学习实体来查找用户言语的意向,并从该言语中提取详细信息。In this tutorial, the app uses a machine-learning entity to find the intent of a user's utterance and extract details from that utterance. 使用机器学习实体可以分解实体的详细信息。Using the machine-learning entity allows you to decompose the details of the entity.