LUIS DevOps 测试Testing for LUIS DevOps

正在开发语言理解 (LUIS) 应用的软件工程师可以通过遵循以下指南,应用关于源代码管理自动生成测试发布管理的 DevOps 实践。Software engineers who are developing a Language Understanding (LUIS) app can apply DevOps practices around source control, automated builds, testing, and release management by following these guidelines.

在敏捷软件开发方法中,测试在生成优质软件方面扮演着重要的角色。In agile software development methodologies, testing plays an integral role in building quality software. 对 LUIS 应用的每个重大更改都应附带测试,旨在测试开发人员在应用中生成的新功能。Every significant change to a LUIS app should be accompanied by tests designed to test the new functionality the developer is building into the app. 这些测试将与 LUIS 应用的 .lu 源一起签入到源代码存储库中。These tests are checked into your source code repository along with the .lu source of your LUIS app. 当应用满足测试条件时,将完成更改的实现。The implementation of the change is finished when the app satisfies the tests.

测试是 CI/CD 工作流的关键部分。Tests are a critical part of CI/CD workflows. 当拉取请求 (PR) 建议对 LUIS 应用进行更改时,或者在将更改合并到主分支后,CI 工作流应运行测试,以验证更新是否未导致任何回归。When changes to a LUIS app are proposed in a pull request (PR) or after changes are merged into your master branch, then CI workflows should run the tests to verify that the updates haven't caused any regressions.

如何进行单元测试和批处理测试How to do Unit testing and Batch testing

对于需要在持续集成工作流中执行的 LUIS 应用,有两种不同类型的测试:There are two different kinds of testing for a LUIS app that you need to perform in continuous integration workflows:

  • 单元测试 - 相对简单的测试,用于验证 LUIS 应用的主要功能。Unit tests - Relatively simple tests that verify the key functionality of your LUIS app. 当给定的测试言语返回预期意向和预期实体时,单元测试通过。A unit test passes when the expected intent and the expected entities are returned for a given test utterance. 所有单元测试都必须通过,测试运行才能成功完成。All unit tests must pass for the test run to complete successfully.
    这种测试类似于交互式测试,可以在 LUIS 门户中进行。This kind of testing is similar to Interactive testing that you can do in the LUIS portal.

  • 批处理测试 - 批处理测试是对当前已训练的模型进行的全面测试,用于衡量其性能。Batch tests - Batch testing is a comprehensive test on your current trained model to measure its performance. 与单元测试不同,批处理测试不是关于“通过或失败”的测试。Unlike unit tests, batch testing isn't pass|fail testing. 批处理测试的预期不是每个测试都将返回预期意向和预期实体。The expectation with batch testing is not that every test will return the expected intent and expected entities. 相反,批处理测试可帮助你在应用中查看每个意向和实体的准确性,并帮助你将一段时间内的改进进行比较。Instead, a batch test helps you view the accuracy of each intent and entity in your app and helps you to compare over time as you make improvements.
    这种类型的测试与可以在 LUIS 门户中以交互方式执行的批处理测试相同。This kind of testing is the same as the Batch testing that you can perform interactively in the LUIS portal.

可以从项目的开头使用单元测试。You can employ unit testing from the beginning of your project. 当你开发 LUIS 应用的架构后,要提高其准确性时,批处理测试才具有真正的价值。Batch testing is only really of value once you've developed the schema of your LUIS app and you're working on improving its accuracy.

对于单元测试和批处理测试,请确保测试言语与训练言语保持独立。For both unit tests and batch tests, make sure that your test utterances are kept separate from your training utterances. 如果将用于训练的数据用于测试,你会得到应用性能极佳的错误印象,而事实只是模型与测试数据过拟合而已。If you test on the same data you train on, you'll get the false impression your app is performing well when it's just overfitting to the testing data. 测试对于模型必须是陌生的,以测试它的通用化程度。Tests must be unseen by the model to test how well it is generalizing.

编写测试Writing tests

编写一组测试时,对于每个测试,都需要定义:When you write a set of tests, for each test you need to define:

  • 测试话语Test utterance
  • 预期意向Expected intent
  • 预期实体。Expected entities.

使用 LUIS 批处理文件语法在 JSON 格式的文件中定义一组测试。Use the LUIS batch file syntax to define a group of tests in a JSON-formatted file. 例如:For example:

[
  {
    "text": "example utterance goes here",
    "intent": "intent name goes here",
    "entities":
    [
        {
            "entity": "entity name 1 goes here",
            "startPos": 14,
            "endPos": 23
        },
        {
            "entity": "entity name 2 goes here",
            "startPos": 14,
            "endPos": 23
        }
    ]
  }
]

一些测试工具,如 NLU.DevOps 还支持 LUDown 格式的测试文件。Some test tools, such as NLU.DevOps also support LUDown-formatted test files.

设计单元测试Designing unit tests

单元测试应设计为测试 LUIS 应用的核心功能。Unit tests should be designed to test the core functionality of your LUIS app. 在应用开发的每个迭代或冲刺 (sprint) 中,应编写足够多的测试,以验证在该迭代中实现的关键功能是否正常工作。In each iteration, or sprint, of your app development, you should write a sufficient number of tests to verify that the key functionality you are implementing in that iteration is working correctly.

在每个单元测试中,对于给定的测试言语,可以:In each unit test, for a given test utterance, you can:

  • 测试是否返回了正确的意向Test that the correct intent is returned
  • 测试是否正在返回对解决方案至关重要的“密钥”实体。Test that the 'key' entities - those that are critical to your solution - are being returned.
  • 测试意向和实体的预测分数是否超出了定义的阈值。Test that the prediction score for intent and entities exceeds a threshold that you define. 例如,可以决定仅在意向和关键实体的预测分数超过 0.75 时才认为测试已通过。For example, you could decide that you will only consider that a test has passed if the prediction score for the intent and for your key entities exceeds 0.75.

在单元测试中,最好测试是否已在预测响应中返回关键实体,但忽略任何误报。In unit tests, it's a good idea to test that your key entities have been returned in the prediction response, but to ignore any false positives. 误报是在预测响应中找到的实体,但未在测试的预期结果中定义。False positives are entities that are found in the prediction response but which are not defined in the expected results for your test. 忽略误报使得创作单元测试变得不太繁琐,同时仍允许你将精力集中在测试是否在预测响应中返回对解决方案至关重要的数据。By ignoring false positives, it makes it less onerous to author unit tests while still allowing you to focus on testing that the data that is key to your solution is being returned in a prediction response.

提示

NLU.DevOps 工具支持所有 LUIS 测试需求。The NLU.DevOps tool supports all your LUIS testing needs. 当在单元测试模式下使用时,compare 命令将断言所有测试都通过,并将忽略未在预期结果中标记的实体的误报结果。The compare command when used in unit test mode will assert that all tests pass, and will ignore false positive results for entities that are not labeled in the expected results.

设计批处理测试Designing Batch tests

批处理测试集应包含大量测试用例,旨在跨 LUIS 应用中的所有意向和所有实体进行测试。Batch test sets should contain a large number of test cases, designed to test across all intents and all entities in your LUIS app. 有关定义批处理测试集的信息,请参阅 LUIS 门户中的批处理测试See Batch testing in the LUIS portal for information on defining a batch test set.

运行测试Running tests

LUIS 门户提供的功能可帮助进行交互式测试:The LUIS portal offers features to help with interactive testing:

  • 通过交互式测试,可以提交示例言语,并获取 LUIS 识别的意向和实体的响应。Interactive testing allows you to submit a sample utterance and get a response of LUIS-recognized intents and entities. 通过视觉检测来验证测试是否成功。You verify the success of the test by visual inspection.

  • 批处理测试使用批处理测试文件作为输入来验证活动训练版本,以判断其预测准确性。Batch testing uses a batch test file as input to validate your active trained version to measure its prediction accuracy. 批处理测试可帮助你查看活动版本中每个意向和实体的准确性,并使用图表显示结果。A batch test helps you view the accuracy of each intent and entity in your active version, displaying results with a chart.

在自动生成工作流中运行测试Running tests in an automated build workflow

LUIS 门户中的交互式测试功能非常有用,但对于 DevOps,在 CI/CD 工作流中执行的自动测试具有某些要求:The interactive testing features in the LUIS portal are useful, but for DevOps, automated testing performed in a CI/CD workflow brings certain requirements:

  • 测试工具必须在生成服务器上的工作流步骤中运行。Test tools must run in a workflow step on a build server. 这意味着工具必须能够在命令行上运行。This means the tools must be able to run on the command line.
  • 测试工具必须能够对终结点执行一组测试,并根据实际结果自动验证预期结果。The test tools must be able to execute a group of tests against an endpoint and automatically verify the expected results against the actual results.
  • 如果测试失败,则测试工具必须返回状态代码以暂停工作流并“让生成失败”。If the tests fail, the test tools must return a status code to halt the workflow and "fail the build".

LUIS 不提供命令行工具,可不提供可提供这些功能的高级 API。LUIS does not offer a command-line tool or a high-level API that offers these features. 建议使用 NLU.DevOps 工具,在命令行和 CI/CD 工作流中的自动测试期间运行测试和验证结果。We recommend that you use the NLU.DevOps tool to run tests and verify results, both at the command line and during automated testing within a CI/CD workflow.

LUIS 门户中可用的测试功能不需要已发布的终结点,并且属于 LUIS 创作功能。The testing capabilities that are available in the LUIS portal don't require a published endpoint and are a part of the LUIS authoring capabilities. 在自动生成工作流中实现测试时,必须将要测试的 LUIS 应用版本发布到到终结点,以便测试工具(如 NLU.DevOps)可以在测试过程中发送预测请求。When you're implementing testing in an automated build workflow, you must publish the LUIS app version to be tested to an endpoint so that test tools such as NLU.DevOps can send prediction requests as part of testing.

提示

  • 如果要实现自己的测试解决方案并编写代码将测试言语发送到终结点,请记住,如果使用 LUIS 创作密钥,则允许的事务速率被限制为 5TPS。If you're implementing your own testing solution and writing code to send test utterances to an endpoint, remember that if you are using the LUIS authoring key, the allowed transaction rate is limited to 5TPS. 限制发送速率,或改用预测密钥。Either throttle the sending rate or use a prediction key instead.
  • 将测试查询发送到终结点时,请记住在预测请求的查询字符串中使用 log=falseWhen sending test queries to an endpoint, remember to use log=false in the query string of your prediction request. 这可确保测试言语不会被 LUIS 记录下来,并最终出现在 LUIS 主动学习功能提供的终结点言语评审列表中,从而被意外地添加到应用的训练言语。This ensures that your test utterances do not get logged by LUIS and end up in the endpoint utterances review list presented by the LUIS active learning feature and, as a result, accidentally get added to the training utterances of your app.

在命令行和 CI/CD 工作流中运行单元测试Running Unit tests at the command line and in CI/CD workflows

可以使用 NLU.DevOps 包,在命令行运行测试:You can use the NLU.DevOps package to run tests at the command line:

  • 使用 NLU.DevOps 测试命令将测试文件中的测试提交到终结点,并在文件中捕获实际的预测结果。Use the NLU.DevOps test command to submit tests from a test file to an endpoint and to capture the actual prediction results in a file.
  • 使用 NLU.DevOps 比较命令将实际结果与输入测试文件中定义的预期结果进行比较。Use the NLU.DevOps compare command to compare the actual results with the expected results defined in the input test file. compare 命令生成 NUnit 测试输出,在通过使用 --unit-test 标志使用单元测试模式时,将断言所有测试都通过。The compare command generates NUnit test output, and when used in unit test mode by use of the --unit-test flag, will assert that all tests pass.

在命令行和 CI/CD 工作流中运行批处理测试Running Batch tests at the command line and in CI/CD workflows

还可以使用 NLU.DevOps 包,用于在命令行运行批处理测试。You can also use the NLU.DevOps package to run batch tests at the command line.

  • 使用 NLU.DevOps 测试命令将测试文件中的测试提交到终结点,并在文件中捕获实际的预测结果,与单元测试相同。Use the NLU.DevOps test command to submit tests from a test file to an endpoint and to capture the actual prediction results in a file, same as with unit tests.
  • 性能测试模式下使用 NLU.DevOps 比较命令衡量应用的性能,还可以将应用的性能与基线性能基准进行比较,例如最新提交到主版本或当前版本的结果。Use the NLU.DevOps compare command in Performance test mode to measure the performance of your app You can also compare the performance of your app against a baseline performance benchmark, for example, the results from the latest commit to master or the current release. 在性能测试模式中,compare 命令生成 NUnit 测试输出,并以 JSON 格式生成批处理测试结果In Performance test mode, the compare command generates NUnit test output and batch test results in JSON format.

LUIS 非确定性训练和对测试的影响LUIS non-deterministic training and the effect on testing

LUIS 训练模型(例如意向)时,既需要正数据(为模型训练应用而提供的标记的训练言语),也需要负数据(不是模型用途有效示例的数据)。When LUIS is training a model, such as an intent, it needs both positive data - the labeled training utterances that you've supplied to train the app for the model - and negative data - data that is not valid examples of the usage of that model. 在训练过程中,LUIS 将从为其他模型提供的所有正数据生成一个模型的负数据,但在某些情况下,可能会导致数据不平衡。During training, LUIS builds the negative data of one model from all the positive data you've supplied for the other models, but in some cases that can produce a data imbalance. 为了避免这种不平衡,LUIS 以非确定性的方式为负数据的一个子集进行采样,进行优化以获得更平衡的训练集、提高模型性能并缩短训练时间。To avoid this imbalance, LUIS samples a subset of the negative data in a non-deterministic fashion to optimize for a better balanced training set, improved model performance, and faster training time.

这种非确定性训练的结果是,可能会在不同的培训会话之间得到略微不同的预测响应,通常用于预测分数不高的意向和/或实体。The result of this non-deterministic training is that you may get a slightly different prediction response between different training sessions, usually for intents and/or entities where the prediction score is not high.

如果要为出于测试目的而生成的 LUIS 应用版本禁用非确定性训练,请使用版本设置 API,将 UseAllTrainingData 设置设为 trueIf you want to disable non-deterministic training for those LUIS app versions that you're building for the purpose of testing, use the Version settings API with the UseAllTrainingData setting set to true.

后续步骤Next steps