使用一组示例话语进行批处理测试Batch testing with a set of example utterances

批处理测试会验证活动训练版本,以判断其预测准确性。Batch testing validates your active trained version to measure its prediction accuracy. 批量测试可帮助你查看活动版本中每个意向和实体的准确性。A batch test helps you view the accuracy of each intent and entity in your active version. 查看批处理测试结果以采取适当的措施来提高准确性,例如,如果应用经常无法识别正确的意向或在话语中标记实体,则向意向添加更多示例话语。Review the batch test results to take appropriate action to improve accuracy, such as adding more example utterances to an intent if your app frequently fails to identify the correct intent or labeling entities within the utterance.

批处理测试的组数据Group data for batch test

对于 LUIS 来说,用于批处理测试的表达必须是全新,这一点很重要。It is important that utterances used for batch testing are new to LUIS. 如果有话语数据集,请将话语划分为三个集:添加到意向的示例话语、从已发布的终结点接收的话语,以及在训练 LUIS 后用于对其进行批处理测试的话语。If you have a data set of utterances, divide the utterances into three sets: example utterances added to an intent, utterances received from the published endpoint, and utterances used to batch test LUIS after it is trained.

你使用的批处理 JSON 文件中的言语应包含机器学习的带有标签的顶级实体(包括开始和结束位置)。The batch JSON file you use should include utterances with top-level machine-learning entities labeled including start and end position. 言语不应是已包含在应用中的示例的一部分。The utterances should not be part of the examples already in the app. 它们应该是你要在其中积极预测意向和实体的言语。They should be utterances you want to positively predict for intent and entities.

可以按意向和/或实体划分测试,或者将所有测试(最多 1000 个言语)包含在同一文件中。You can separate out tests by intent and/or entity or have all the tests (up to 1000 utterances) in the same file.

导入批处理文件的常见错误Common errors importing a batch

如果在将批处理文件上传到 LUIS 时遇到错误,请检查是否存在以下常见问题:If you run into errors uploading your batch file to LUIS, check for the following common issues:

  • 批处理文件中的言语多于 1,000 条More than 1,000 utterances in a batch file
  • 不具有实体属性的话语 JSON 对象。An utterance JSON object that doesn't have an entities property. 此属性可以是空数组。The property can be an empty array.
  • 在多个实体中标记的字词Word(s) labeled in multiple entities
  • 实体标签以空格开头或结尾。Entity labels starting or ending on a space.

修复批处理错误Fixing batch errors

如果在批处理测试中出现错误,可以向意向添加更多表达,和/或在实体中标记更多表达,以帮助 LUIS 在意向间进行区分。If there are errors in the batch testing, you can either add more utterances to an intent, and/or label more utterances with the entity to help LUIS make the discrimination between intents. 如果你已添加了表达,且对其进行了标记,但在批处理测试中仍收到预测错误,请考虑添加短语列表功能,其中包含特定于域的词汇,以帮助 LUIS 更快地理解。If you have added utterances, and labeled them, and still get prediction errors in batch testing, consider adding a phrase list feature with domain-specific vocabulary to help LUIS learn faster.

使用 LUIS 门户进行批量测试Batch testing using the LUIS portal

导入并训练示例应用Import and train an example app

导入提取披萨订单的应用,例如 1 pepperoni pizza on thin crustImport an app that takes a pizza order such as 1 pepperoni pizza on thin crust.

  1. 下载并保存应用 JSON 文件Download and save app JSON file.

  2. 登录到 LUIS 门户,选择“订阅”和“创作资源”以查看分配给该创作资源的应用。Sign in to the LUIS portal, and select your Subscription and Authoring resource to see the apps assigned to that authoring resource.

  3. 选择“新建应用”旁边的箭头,并单击“以 JSON 格式导入”,以便将 JSON 导入到一个新应用。Select the arrow next to New app and click Import as JSON to import the JSON into a new app. 将应用命名为 Pizza appName the app Pizza app.

  4. 选择导航栏右上角的“训练”以训练该应用。Select Train in the top-right corner of the navigation to train the app.

批处理测试中的角色Roles in batch testing

注意

批处理测试中不支持实体角色。Entity roles are not supported in batch testing.

批量测试文件Batch test file

示例 JSON 包含一个言语(该言语包含一个带标签的实体)用于演示测试文件的外观。The example JSON includes one utterance with a labeled entity to illustrate what a test file looks like. 你自己的测试中应该包含多个言语,这些言语标记了正确的意向和机器学习实体。In your own tests, you should have many utterances with correct intent and machine-learning entity labeled.

  1. 在文本编辑器中创建 pizza-with-machine-learned-entity-test.json下载它。Create pizza-with-machine-learned-entity-test.json in a text editor or download it.

  2. 在 JSON 格式的批处理文件中,添加要在测试中预测的言语和 意向In the JSON-formatted batch file, add an utterance with the Intent you want predicted in the test.

[
    {
        "text": "I want to pick up 1 cheese pizza",
        "intent": "ModifyOrder",
        "entities": [
            {
                "entity": "Order",
                "startPos": 18,
                "endPos": 31
            },
            {
                "entity": "ToppingList",
                "startPos": 20,
                "endPos": 25
            }
        ]
    }
]

运行批处理Run the batch

  1. 选择顶部导航栏的“测试”。Select Test in the top navigation bar.

  2. 选择右侧面板中的“批处理测试面板”。Select Batch testing panel in the right-side panel.

    批处理测试链接

  3. 选择“导入”。Select Import. 在出现的对话框中,选择“选择文件”并找到具有正确 JSON 格式的 JSON 文件,该文件包含不超过 1,000 条待测试的言语。In the dialog box that appears, select Choose File and locate a JSON file with the correct JSON format that contains no more than 1,000 utterances to test.

    浏览器顶部的红色通知栏中将报告导入错误。Import errors are reported in a red notification bar at the top of the browser. 导入出现错误时,不会创建任何数据集。When an import has errors, no dataset is created. 有关详细信息,请参阅常见错误For more information, see Common errors.

  4. 选择 pizza-with-machine-learned-entity-test.json 文件的文件位置。Choose the file location of the pizza-with-machine-learned-entity-test.json file.

  5. 命名数据集 pizza test,然后选择“完成”。Name the dataset pizza test and select Done.

  6. 选择“运行”按钮。Select the Run button. 在批量测试运行后,选择“查看结果”。After the batch test runs, select See results.

    提示

    • 选择“下载”会下载你上传的同一文件。Selecting Download will download the same file that you uploaded.
    • 如果你看到批量测试失败,则至少一个言语意向与预测不匹配。If you see the batch test failed, at least one utterance intent did not match the prediction.

查看意向的批处理结果Review batch results for intents

若要查看批处理测试结果,请选择“查看结果”。To review the batch test results, select See results. 测试结果以图形显示如何针对活动版本预测测试言语。The test results show graphically how the test utterances were predicted against the active version.

批处理图表将结果显示在四个象限中。The batch chart displays four quadrants of results. 在图表右侧是一个筛选器。To the right of the chart is a filter. 筛选器包含意向和实体。The filter contains intents and entities. 选择图表的一个部分或图表中的一个点时,关联的话语显示在图表下方。When you select a section of the chart or a point within the chart, the associated utterance(s) display below the chart.

鼠标悬停在图表上时,鼠标滚轮可以放大或缩小图表中的显示。While hovering over the chart, a mouse wheel can enlarge or reduce the display in the chart. 当图表上有许多点紧密地聚集在一起时,这是非常有用的。This is useful when there are many points on the chart clustered tightly together.

图表分为四个象限,其中两个部分以红色显示。The chart is in four quadrants, with two of the sections displayed in red.

  1. 在筛选器列表中选择“ModifyOrder”意向。Select the ModifyOrder intent in the filter list. 言语预测为“漏报”,这意味着,该言语已成功匹配其在批处理文件中列出的正面预测结果。The utterance is predicted as a True Positive meaning the utterance successfully matched its positive prediction listed in the batch file.

    言语已成功匹配其正面预测结果Utterance successfully matched its positive prediction

    筛选器列表中的绿色勾选标记也指示每个意向的测试成功。The green checkmarks in the filters list also indicate the success of the test for each intent. 所有其他意向列出了 1/1 正面评分,因为言语是针对每个意向测试的,而任何意向的负面测试不会列在批处理测试中。All the other intents are listed with a 1/1 positive score because the utterance was tested against each intent, as a negative test for any intents not listed in the batch test.

  2. 选择“Confirmation”意向。Select the Confirmation intent. 此意向未在批处理测试中列出,因此,这是批处理测试中列出的言语的负面测试。This intent isn't listed in the batch test so this is a negative test of the utterance that is listed in the batch test.

    针对批处理文件中未列出的意向成功负面预测了言语Utterance successfully predicted negative for unlisted intent in batch file

    根据筛选器和网格中的绿色文本所示,负面测试成功。The negative test was successful, as noted with the green text in the filter, and the grid.

查看实体的批处理测试结果Review batch test results for entities

ModifyOrder 实体(包含子实体的机器实体)显示顶级实体是否匹配,并显示子实体是如何预测的。The ModifyOrder entity, as a machine entity with subentities, displays if the top-level entity matched and how the subentities are predicted.

  1. 在筛选器列表中选择“ModifyOrder”实体,然后选择网格中的圆圈。Select the ModifyOrder entity in the filter list then select the circle in the grid.

  2. 实体预测结果显示在图表下方。The entity prediction displays below the chart. 显示的内容包括符合预期的预测对应的实线,以及不符合预期的预测对应的虚线。The display includes solid lines for predictions that match the expectation and dotted lines for predictions that don't match the expectation.

    已成功预测批处理文件中的实体父级Entity parent successfully predicted in batch file

筛选图表结果Filter chart results

若要按特定意向或实体筛选图表,请在右侧筛选面板中选择意向或实体。To filter the chart by a specific intent or entity, select the intent or entity in the right-side filtering panel. 图中的数据点及其分布会根据不同的选择而相应更新。The data points and their distribution update in the graph according to your selection.

可视化的批处理测试结果

图表结果示例Chart result examples

对于 LUIS 门户中的图表,你可以执行以下操作:The chart in the LUIS portal, you can perform the following actions:

查看单点陈述数据View single-point utterance data

在图表中,将鼠标悬停在某个数据点上可查看其预测结果的确定性分数。In the chart, hover over a data point to see the certainty score of its prediction. 选择数据点可在页面底部的陈述列表中检索出相应的陈述。Select a data point to retrieve its corresponding utterance in the utterances list at the bottom of the page.

所选的陈述

查看分区数据View section data

在此四分区图表中,选择分区名称,例如在图表右上角的“误报”。In the four-section chart, select the section name, such as False Positive at the top-right of the chart. 该分区中的所有陈述都会显示在该图表下方的列表中。Below the chart, all utterances in that section display below the chart in a list.

按照分区所选的陈述

在上图中,陈述 switch on 被标记为 TurnAllOn 意向,但接收的预测意向为 None。In this preceding image, the utterance switch on is labeled with the TurnAllOn intent, but received the prediction of None intent. 这表示 TurnAllOn 意向需要更多示例陈述才能做出预期预测。This is an indication that the TurnAllOn intent needs more example utterances in order to make the expected prediction.

该图表中红色的两个分区表示与预期预测不匹配的陈述。The two sections of the chart in red indicate utterances that did not match the expected prediction. 它们表示需要更多 LUIS 训练的陈述。These indicate utterances which LUIS needs more training.

该图表中绿色的两个分区与预期预测相匹配。The two sections of the chart in green did match the expected prediction.

使用 REST API 进行批量测试Batch testing using the REST API

LUIS 允许你使用 LUIS 门户和 REST API 进行批量测试。LUIS lets you batch test using the LUIS portal and REST API. 下面列出了 REST API 的终结点。The endpoints for the REST API are listed below. 有关使用 LUIS 门户进行批量测试的信息,请参阅教程:批量测试数据集For information on batch testing using the LUIS portal, see Tutorial: batch test data sets. 请使用下面的完整 URL(将占位符值替换为你自己的 LUIS 预测密钥和终结点)。Use the complete URLs below, replacing the placeholder values with your own LUIS Prediction key and endpoint.

不要忘记将 LUIS 键添加到标头中的 Ocp-Apim-Subscription-Key 并将 Content-Type 设置为 application/jsonRemember to add your LUIS key to Ocp-Apim-Subscription-Key in the header, and set Content-Type to application/json.

启动批量测试Start a batch test

使用应用版本 ID 或发布槽启动批量测试。Start a batch test using either an app version ID or a publishing slot. 将 POST 请求发送到以下终结点格式之一。Send a POST request to one of the following endpoint formats. 在请求正文中包括你的批处理文件。Include your batch file in the body of the request.

发布槽Publishing slot

  • <YOUR-PREDICTION-ENDPOINT>/luis/prediction/v3.0-preview/apps/<YOUR-APP-ID>/slots/<YOUR-SLOT-NAME>/evaluations

应用版本 IDApp version ID

  • <YOUR-PREDICTION-ENDPOINT>/luis/prediction/v3.0-preview/apps/<YOUR-APP-ID>/versions/<YOUR-APP-VERSION-ID>/evaluations

这些终结点会返回一个操作 ID,你将使用它来检查状态并获取结果。These endpoints will return an operation ID that you will use to check the status, and get results.

获取正在进行的批量测试的状态Get the status of an ongoing batch test

使用你启动的批量测试中的操作 ID 通过以下终结点格式获取其状态:Use the operation ID from the batch test you started to get its status from the following endpoint formats:

发布槽Publishing slot

  • <YOUR-PREDICTION-ENDPOINT>/luis/prediction/v3.0-preview/apps/<YOUR-APP-ID>/slots/<YOUR-SLOT-ID>/evaluations/<YOUR-OPERATION-ID>/status

应用版本 IDApp version ID

  • <YOUR-PREDICTION-ENDPOINT>/luis/prediction/v3.0-preview/apps/<YOUR-APP-ID>/versions/<YOUR-APP-VERSION-ID>/evaluations/<YOUR-OPERATION-ID>/status

获取批量测试的结果Get the results from a batch test

使用你启动的批量测试中的操作 ID 通过以下终结点格式获取其结果:Use the operation ID from the batch test you started to get its results from the following endpoint formats:

发布槽Publishing slot

  • <YOUR-PREDICTION-ENDPOINT>/luis/prediction/v3.0-preview/apps/<YOUR-APP-ID>/slots/<YOUR-SLOT-ID>/evaluations/<YOUR-OPERATION-ID>/result

应用版本 IDApp version ID

  • <YOUR-PREDICTION-ENDPOINT>/luis/prediction/v3.0-preview/apps/<YOUR-APP-ID>/versions/<YOUR-APP-VERSION-ID>/evaluations/<YOUR-OPERATION-ID>/result

言语的批处理文件Batch file of utterances

提交话语批处理文件(称为数据集 ),以用于批处理测试。Submit a batch file of utterances, known as a data set, for batch testing. 该数据集是一个 JSON 格式的文件,包含最多 1,000 条带标签的言语。The data set is a JSON-formatted file containing a maximum of 1,000 labeled utterances. 可以在一个应用中测试最多 10 个数据集。You can test up to 10 data sets in an app. 如果需要测试更多数据集,请删除数据集,然后添加新数据集。If you need to test more, delete a data set and then add a new one. 即使批处理文件数据中没有对应的实体,模型中的所有自定义实体也会出现在批处理测试实体筛选器中。All custom entities in the model appear in the batch test entities filter even if there are no corresponding entities in the batch file data.

批处理文件包含表达。The batch file consists of utterances. 每个言语都必须有预期的意向预测,此外还必须有你预期可以检测到的机器学习实体Each utterance must have an expected intent prediction along with any machine-learning entities you expect to be detected.

使用实体的意向的批处理语法模板Batch syntax template for intents with entities

使用以下模板启动批处理文件:Use the following template to start your batch file:

{
    "LabeledTestSetUtterances": [
        {
            "text": "play a song",
            "intent": "play_music",
            "entities": [
                {
                    "entity": "song_parent",
                    "startPos": 0,
                    "endPos": 15,
                    "children": [
                        {
                            "entity": "pre_song",
                            "startPos": 0,
                            "endPos": 3
                        },
                        {
                            "entity": "song_info",
                            "startPos": 5,
                            "endPos": 15
                        }
                    ]
                }
            ]
        }
    ]
}

批处理文件使用 startPos 和 endPos 属性来记录实体的开始和结束 。The batch file uses the startPos and endPos properties to note the beginning and end of an entity. 值从零开始,不得以空格开始或结束。The values are zero-based and should not begin or end on a space. 这与使用 startIndex 和 endIndex 属性的查询日志不同。This is different from the query logs, which use startIndex and endIndex properties.

如果不想测试实体,请包含 entities 属性并将值设置为空数组 []If you do not want to test entities, include the entities property and set the value as an empty array, [].

REST API 批量测试结果REST API batch test results

下面是 API 返回的几个对象:There are several objects returned by the API:

  • 有关意向和实体模型的信息,例如精准率、召回率和 F 分数。Information about the intents and entities models, such as precision, recall and F-score.
  • 有关实体模型的信息,例如每个实体的精准率、召回率和 F 分数Information about the entities models, such as precision, recall and F-score) for each entity
    • 使用 verbose 标志,你可以获取有关实体的详细信息,例如 entityTextFScoreentityTypeFScoreUsing the verbose flag, you can get more information about the entity, such as entityTextFScore and entityTypeFScore.
  • 提供的言语以及预测的和标记的意向名称Provided utterances with the predicted and labeled intent names
  • 误报的实体的列表,和漏报的实体的列表。A list of false positive entities, and a list of false negative entities.

后续步骤Next steps

如果测试表明 LUIS 应用未正确识别意向和实体,则可以通过标记更多陈述或添加功能来提高 LUIS 应用的性能。If testing indicates that your LUIS app doesn't recognize the correct intents and entities, you can work to improve your LUIS app's performance by labeling more utterances or adding features.