如何使用自定义实体模式匹配识别意向

项目
04/09/2024

Azure AI 服务语音 SDK 内置了一项功能，可通过简单语言模式匹配来提供意向识别。意向是用户想要执行的操作：关闭窗口、标记复选框、插入一些文本等。

在本指南中，我们将使用语音 SDK 开发一个控制台应用程序，用于通过设备的麦克风从说出的语音语句中派生意向。学习如何：

创建引用语音 SDK NuGet 包的 Visual Studio 项目
创建语音配置并获取意向识别器
通过语音 SDK API 添加意向和模式
通过语音 SDK API 添加自定义实体
使用异步的事件驱动的连续识别

何时使用模式匹配

在以下情况下使用模式匹配：

你只对严格匹配用户所说的内容感兴趣。这些模式比对话语言理解 (CLU) 更严格。
你没有访问 CLU 模式的权限，但仍需要意向。

有关详细信息，请参阅模式匹配概述。

先决条件

在开始阅读本指南之前，请务必准备好以下各项：

Azure AI 服务资源或统一语音资源
Visual Studio 2019（版本不限）。

创建项目

在 Visual Studio 2019 中创建新的 C# 控制台应用程序项目并安装语音 SDK。

从一些样本代码入手

让我们打开 Program.cs 并添加一些代码作为项目的框架。

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Intent;

namespace helloworld
{
    class Program
    {
        static void Main(string[] args)
        {
            IntentPatternMatchingWithMicrophoneAsync().Wait();
        }

        private static async Task IntentPatternMatchingWithMicrophoneAsync()
        {
            var config = SpeechConfig.FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
        }
    }
}

创建语音配置

在初始化 IntentRecognizer 对象之前，需要创建一个配置来使用 Azure AI 服务预测资源的密钥和 Azure 区域。

将 "YOUR_SUBSCRIPTION_KEY" 替换为 Azure AI 服务预测密钥。
将 "YOUR_SUBSCRIPTION_REGION" 替换为 Azure AI 服务资源区域。

此示例使用 FromSubscription() 方法来生成 SpeechConfig。有关可用方法的完整列表，请参阅 SpeechConfig 类。

初始化 IntentRecognizer

现在创建一个 IntentRecognizer。将此代码插入语音配置下。

using (var recognizer = new IntentRecognizer(config))
{
    
}

添加一些意向

需要将一些模式与 PatternMatchingModel 关联并将其应用于 IntentRecognizer。首先创建一个 PatternMatchingModel 并向其添加一些意向。

注意

可以向一个 PatternMatchingIntent 添加多个模式。

在 using 块内插入此代码：

// Creates a Pattern Matching model and adds specific intents from your model. The
// Id is used to identify this model from others in the collection.
var model = new PatternMatchingModel("YourPatternMatchingModelId");

// Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
var patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

// Creates a pattern that uses an optional entity and group that could be used to tie commands together.
var patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

// You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
// to distinguish between the instances. For example:
var patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
// NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
//       and is separated from the entity name by a ':'

// Creates the pattern matching intents and adds them to the model
model.Intents.Add(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
model.Intents.Add(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

添加一些自定义实体

若要充分利用模式匹配程序，可以自定义实体。列出“floorName”的可用楼层。我们还使“parkingLevel”成为整数实体。

将下面的代码插入到你的意向下方：

// Creates the "floorName" entity and set it to type list.
// Adds acceptable values. NOTE the default entity type is Any and so we do not need
// to declare the "action" entity.
model.Entities.Add(PatternMatchingEntity.CreateListEntity("floorName", EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

// Creates the "parkingLevel" entity as a pre-built integer
model.Entities.Add(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

将模型应用于识别器

现在，需要将模型应用于 IntentRecognizer。可以一次使用多个模型，以便 API 采用模型集合。

将下面的代码插入到你的实体下方：

var modelCollection = new LanguageUnderstandingModelCollection();
modelCollection.Add(model);

recognizer.ApplyLanguageModels(modelCollection);

识别意向

在 IntentRecognizer 对象中，我们将调用 RecognizeOnceAsync() 方法。此方法要求语音服务识别单个短语中的语音，并在识别到短语后停止识别语音。

应用语言模型后插入以下代码：

Console.WriteLine("Say something...");

var result = await recognizer.RecognizeOnceAsync();

显示识别结果（或错误）

当语音服务返回识别结果后，我们将输出结果。

将此代码插在 var result = await recognizer.RecognizeOnceAsync(); 下：

if (result.Reason == ResultReason.RecognizedIntent)
{
    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    Console.WriteLine($"       Intent Id={result.IntentId}.");

    var entities = result.Entities;
    switch (result.IntentId)
    {
        case "ChangeFloors":
            if (entities.TryGetValue("floorName", out string floorName))
            {
                Console.WriteLine($"       FloorName={floorName}");
            }

            if (entities.TryGetValue("floorName:1", out floorName))
            {
                Console.WriteLine($"     FloorName:1={floorName}");
            }

            if (entities.TryGetValue("floorName:2", out floorName))
            {
                Console.WriteLine($"     FloorName:2={floorName}");
            }

            if (entities.TryGetValue("parkingLevel", out string parkingLevel))
            {
                Console.WriteLine($"    ParkingLevel={parkingLevel}");
            }

            break;

        case "DoorControl":
            if (entities.TryGetValue("action", out string action))
            {
                Console.WriteLine($"          Action={action}");
            }
            break;
    }
}
else if (result.Reason == ResultReason.RecognizedSpeech)
{
    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    Console.WriteLine($"    Intent not recognized.");
}
else if (result.Reason == ResultReason.NoMatch)
{
    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
else if (result.Reason == ResultReason.Canceled)
{
    var cancellation = CancellationDetails.FromResult(result);
    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

    if (cancellation.Reason == CancellationReason.Error)
    {
        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
    }
}

查看代码

此时，代码应如下所示：

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Intent;

namespace helloworld
{
    class Program
    {
        static void Main(string[] args)
        {
            IntentPatternMatchingWithMicrophoneAsync().Wait();
        }

        private static async Task IntentPatternMatchingWithMicrophoneAsync()
        {
            var config = SpeechConfig.FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");

            using (var recognizer = new IntentRecognizer(config))
            {
                // Creates a Pattern Matching model and adds specific intents from your model. The
                // Id is used to identify this model from others in the collection.
                var model = new PatternMatchingModel("YourPatternMatchingModelId");

                // Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
                var patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

                // Creates a pattern that uses an optional entity and group that could be used to tie commands together.
                var patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

                // You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
                // to distinguish between the instances. For example:
                var patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
                // NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
                //       and is separated from the entity name by a ':'

                // Adds some intents to look for specific patterns.
                model.Intents.Add(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
                model.Intents.Add(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

                // Creates the "floorName" entity and set it to type list.
                // Adds acceptable values. NOTE the default entity type is Any and so we do not need
                // to declare the "action" entity.
                model.Entities.Add(PatternMatchingEntity.CreateListEntity("floorName", EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

                // Creates the "parkingLevel" entity as a pre-built integer
                model.Entities.Add(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

                var modelCollection = new LanguageUnderstandingModelCollection();
                modelCollection.Add(model);

                recognizer.ApplyLanguageModels(modelCollection);

                Console.WriteLine("Say something...");

                var result = await recognizer.RecognizeOnceAsync();

                if (result.Reason == ResultReason.RecognizedIntent)
                {
                    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
                    Console.WriteLine($"       Intent Id={result.IntentId}.");

                    var entities = result.Entities;
                    switch (result.IntentId)
                    {
                        case "ChangeFloors":
                            if (entities.TryGetValue("floorName", out string floorName))
                            {
                                Console.WriteLine($"       FloorName={floorName}");
                            }

                            if (entities.TryGetValue("floorName:1", out floorName))
                            {
                                Console.WriteLine($"     FloorName:1={floorName}");
                            }

                            if (entities.TryGetValue("floorName:2", out floorName))
                            {
                                Console.WriteLine($"     FloorName:2={floorName}");
                            }

                            if (entities.TryGetValue("parkingLevel", out string parkingLevel))
                            {
                                Console.WriteLine($"    ParkingLevel={parkingLevel}");
                            }

                            break;

                        case "DoorControl":
                            if (entities.TryGetValue("action", out string action))
                            {
                                Console.WriteLine($"          Action={action}");
                            }
                            break;
                    }
                }
                else if (result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
                    Console.WriteLine($"    Intent not recognized.");
                }
                else if (result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
                else if (result.Reason == ResultReason.Canceled)
                {
                    var cancellation = CancellationDetails.FromResult(result);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                    }
                }
            }
        }
    }
}

生成并运行应用

现在，可以使用语音服务构建应用并测试语音识别。

编译代码- 在 Visual Studio 菜单栏中，选择“生成”>“生成解决方案” 。
启动应用 - 在菜单栏中，选择“调试”>“开始调试”，或按 F5 。
开始识别 - 它将提示你说点什么。默认语言为英语。语音将发送到语音服务，转录为文本，并在控制台中呈现。

例如，如果你说“Take me to floor 2”，则输出如下所示：

Say something...
RECOGNIZED: Text=Take me to floor 2.
       Intent Id=ChangeFloors.
       FloorName=2

作为另一个示例，如果你说“Take me to floor 7”，则输出如下所示：

Say something...
RECOGNIZED: Text=Take me to floor 7.
    Intent not recognized.

未识别任何意向，因为 floorName 的有效值列表中没有 7。

创建项目

在 Visual Studio 2019 中创建新的 C++ 控制台应用程序项目并安装语音 SDK。

从一些样本代码入手

让我们打开 helloworld.cpp 并添加一些代码作为项目的框架。

#include <iostream>
#include <speechapi_cxx.h>

using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Intent;

int main()
{
    std::cout << "Hello World!\n";

    auto config = SpeechConfig::FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
}

创建语音配置

在初始化 IntentRecognizer 对象之前，需要创建一个配置来使用 Azure AI 服务预测资源的密钥和 Azure 区域。

将 "YOUR_SUBSCRIPTION_KEY" 替换为 Azure AI 服务预测密钥。
将 "YOUR_SUBSCRIPTION_REGION" 替换为 Azure AI 服务资源区域。

此示例使用 FromSubscription() 方法来生成 SpeechConfig。有关可用方法的完整列表，请参阅 SpeechConfig 类。

初始化 IntentRecognizer

现在创建一个 IntentRecognizer。将此代码插入语音配置下。

    auto intentRecognizer = IntentRecognizer::FromConfig(config);

添加一些意向

需要将一些模式与 PatternMatchingModel 关联并将其应用于 IntentRecognizer。首先创建一个 PatternMatchingModel 并向其添加一些意向。 PatternMatchingIntent 是一种结构，因此使用其内联语法即可。

注意

可以向一个 PatternMatchingIntent 添加多个模式。

auto model = PatternMatchingModel::FromId("myNewModel");

model->Intents.push_back({"Take me to floor {floorName}.", "Go to floor {floorName}."} , "ChangeFloors");
model->Intents.push_back({"{action} the door."}, "OpenCloseDoor");

添加一些自定义实体

若要充分利用模式匹配程序，可以自定义实体。列出“floorName”的可用楼层。

model->Entities.push_back({ "floorName" , Intent::EntityType::List, Intent::EntityMatchMode::Strict, {"one", "1", "two", "2", "lobby", "ground floor"} });

将模型应用于识别器

现在，需要将模型应用于 IntentRecognizer。可以一次使用多个模型，以便 API 采用模型集合。

std::vector<std::shared_ptr<LanguageUnderstandingModel>> collection;

collection.push_back(model);
intentRecognizer->ApplyLanguageModels(collection);

识别意向

在 IntentRecognizer 对象中，我们将调用 RecognizeOnceAsync() 方法。此方法要求语音服务识别单个短语中的语音，并在识别到短语后停止识别语音。此为简写内容，今后将回头补充。

将下面的代码插入到你的意向下方：

std::cout << "Say something ..." << std::endl;
auto result = intentRecognizer->RecognizeOnceAsync().get();

显示识别结果（或错误）

当语音服务返回识别结果后，我们将输出结果。

将此代码插在 auto result = intentRecognizer->RecognizeOnceAsync().get(); 下：

switch (result->Reason)
{
case ResultReason::RecognizedSpeech:
        std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
        std::cout << "NO INTENT RECOGNIZED!" << std::endl;
        break;
case ResultReason::RecognizedIntent:
    std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
    std::cout << "  Intent Id = " << result->IntentId.c_str() << std::endl;
    auto entities = result->GetEntities();
    if (entities.find("floorName") != entities.end())
    {
        std::cout << "  Floor name: = " << entities["floorName"].c_str() << std::endl;
    }

    if (entities.find("action") != entities.end())
    {
        std::cout << "  Action: = " << entities["action"].c_str() << std::endl;
    }

    break;
case ResultReason::NoMatch:
{
    auto noMatch = NoMatchDetails::FromResult(result);
    switch (noMatch->Reason)
    {
    case NoMatchReason::NotRecognized:
        std::cout << "NOMATCH: Speech was detected, but not recognized." << std::endl;
        break;
    case NoMatchReason::InitialSilenceTimeout:
        std::cout << "NOMATCH: The start of the audio stream contains only silence, and the service timed out waiting for speech." << std::endl;
        break;
    case NoMatchReason::InitialBabbleTimeout:
        std::cout << "NOMATCH: The start of the audio stream contains only noise, and the service timed out waiting for speech." << std::endl;
        break;
    case NoMatchReason::KeywordNotRecognized:
        std::cout << "NOMATCH: Keyword not recognized" << std::endl;
        break;
    }
    break;
}
case ResultReason::Canceled:
{
    auto cancellation = CancellationDetails::FromResult(result);

    if (!cancellation->ErrorDetails.empty())
    {
        std::cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails.c_str() << std::endl;
        std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
    }
}
default:
    break;
}

查看代码

此时，代码应如下所示：

#include <iostream>
#include <speechapi_cxx.h>

using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Intent;

int main()
{
    auto config = SpeechConfig::FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
    auto intentRecognizer = IntentRecognizer::FromConfig(config);

    auto model = PatternMatchingModel::FromId("myNewModel");

    model->Intents.push_back({"Take me to floor {floorName}.", "Go to floor {floorName}."} , "ChangeFloors");
    model->Intents.push_back({"{action} the door."}, "OpenCloseDoor");

    model->Entities.push_back({ "floorName" , Intent::EntityType::List, Intent::EntityMatchMode::Strict, {"one", "1", "two", "2", "lobby", "ground floor"} });

    std::vector<std::shared_ptr<LanguageUnderstandingModel>> collection;

    collection.push_back(model);
    intentRecognizer->ApplyLanguageModels(collection);

    std::cout << "Say something ..." << std::endl;

    auto result = intentRecognizer->RecognizeOnceAsync().get();

    switch (result->Reason)
    {
    case ResultReason::RecognizedSpeech:
        std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
        std::cout << "NO INTENT RECOGNIZED!" << std::endl;
        break;
    case ResultReason::RecognizedIntent:
        std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
        std::cout << "  Intent Id = " << result->IntentId.c_str() << std::endl;
        auto entities = result->GetEntities();
        if (entities.find("floorName") != entities.end())
        {
            std::cout << "  Floor name: = " << entities["floorName"].c_str() << std::endl;
        }

        if (entities.find("action") != entities.end())
        {
            std::cout << "  Action: = " << entities["action"].c_str() << std::endl;
        }

        break;
    case ResultReason::NoMatch:
    {
        auto noMatch = NoMatchDetails::FromResult(result);
        switch (noMatch->Reason)
        {
        case NoMatchReason::NotRecognized:
            std::cout << "NOMATCH: Speech was detected, but not recognized." << std::endl;
            break;
        case NoMatchReason::InitialSilenceTimeout:
            std::cout << "NOMATCH: The start of the audio stream contains only silence, and the service timed out waiting for speech." << std::endl;
            break;
        case NoMatchReason::InitialBabbleTimeout:
            std::cout << "NOMATCH: The start of the audio stream contains only noise, and the service timed out waiting for speech." << std::endl;
            break;
        case NoMatchReason::KeywordNotRecognized:
            std::cout << "NOMATCH: Keyword not recognized." << std::endl;
            break;
        }
        break;
    }
    case ResultReason::Canceled:
    {
        auto cancellation = CancellationDetails::FromResult(result);

        if (!cancellation->ErrorDetails.empty())
        {
            std::cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails.c_str() << std::endl;
            std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
        }
    }
    default:
        break;
    }
}

生成并运行应用

现在，可以使用语音服务构建应用并测试语音识别。

编译代码- 在 Visual Studio 菜单栏中，选择“生成”>“生成解决方案” 。
启动应用 - 在菜单栏中，选择“调试”>“开始调试”，或按 F5 。
开始识别 - 它将提示你说点什么。默认语言为英语。语音将发送到语音服务，转录为文本，并在控制台中呈现。

例如，如果你说“Take me to floor 2”，则输出如下所示：

Say something ...
RECOGNIZED: Text = Take me to floor 2.
  Intent Id = ChangeFloors
  Floor name: = 2

再举一例，如果你说“Take me to floor 7”，则输出如下所示：

Say something ...
RECOGNIZED: Text = Take me to floor 7.
NO INTENT RECOGNIZED!

意向 ID 为空，因为 7 不在列表中。

参考文档 | GitHub 上的其他示例

在本快速入门中，我们安装适用于 Java 的语音 SDK。

平台要求

选择目标环境：

Java 运行时
Android

适用于 Java 的语音 SDK 与 Windows、Linux 和 macOS 兼容。

在 Windows 上，你必须使用 64 位目标体系结构。需要 Windows 10 或更高版本。

安装适用于你的平台的 Microsoft Visual C++ Redistributable for Visual Studio 2015、2017、2019 和 2022。首次安装此包时，可能需要重启。

适用于 Java 的语音 SDK 不支持 ARM64 上的 Windows。

注意

本文引用了 CentOS，这是一个接近生命周期结束 (EOL) 状态的 Linux 发行版。请相应地考虑你的使用和规划。有关详细信息，请参阅 CentOS 生命周期结束指南。

适用于 Java 的语音 SDK 支持 x64、ARM32 (Debian/Ubuntu) 和 ARM64 (Debian/Ubuntu) 体系结构上的以下发行版：

Ubuntu 18.04/20.04
Debian 10/11
Red Hat Enterprise Linux (RHEL) 7/8
CentOS 7

重要

使用 Linux 分发版的最新 LTS 版本。例如，如果使用 Ubuntu 20.04 LTS，请使用最新版本的 Ubuntu 20.04.X。

语音 SDK 依赖于以下 Linux 系统库：

GNU C 库的共享库（包括 POSIX 线程编程库 libpthreads）。
OpenSSL 库 (libssl) 版本 1.x 和证书 (ca-certificates)。
ALSA 应用程序的共享库 (libasound)。

还应安装 ca-certificates 以建立安全的 Websocket 并避免此 WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED 错误。

重要

语音 SDK 尚不支持 OpenSSL 3.0（Ubuntu 22.04 和 Debian 12 中的默认版本）。

运行以下命令：

sudo apt-get update
sudo apt-get install build-essential libssl-dev ca-certificates libasound2 wget

若要在 Alpine Linux 中使用语音 SDK，请按照 Alpine Linux Wiki 中的运行 glibc 程序所述创建 Debian chroot 环境。然后按照此处的 Debian 说明操作。

sudo apt-get update
sudo apt-get install build-essential libssl-dev ca-certificates libasound2 wget

注意

安装开发工具和库：

sudo yum update
sudo yum groupinstall "Development tools"
sudo yum install alsa-lib openssl wget

重要

在 RHEL/CentOS 7 上，按照如何为语音 SDK 配置 RHEL/CentOS 7 上的说明进行操作。
在 RHEL 上，按照如何配置 OpenSSL for Linux 中的说明进行操作。

安装 Java 开发工具包，例如 Azul Zulu OpenJDK。 Microsoft Build of OpenJDK 或你喜欢的 JDK 应该也能正常工作。

安装适用于 Java 的语音 SDK

某些说明使用特定的 SDK 版本，例如 1.24.2。若要查看最新版本，请搜索我们的 GitHub 存储库。

选择目标环境：

Java 运行时
Android

本指南介绍如何在 Java 运行时上安装用于 Java 的语音 SDK。

支持的操作系统

用于 Java 包的语音 SDK 适用于以下操作系统：

Windows：仅限 64 位。
Mac：macOS X 版本 10.14 或更高版本。
Linux：请参阅受支持的 Linux 分发和目标体系结构。

按照以下步骤使用 Apache Maven 安装适用于 Java 的语音 SDK：

安装 Apache Maven。
在需要新项目的位置打开命令提示符，并创建一个新的 pom.xml 文件。

将以下 XML 内容复制到 pom.xml 中：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.microsoft.cognitiveservices.speech.samples</groupId>
    <artifactId>quickstart-eclipse</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <build>
        <sourceDirectory>src</sourceDirectory>
        <plugins>
        <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.7.0</version>
            <configuration>
            <source>1.8</source>
            <target>1.8</target>
            </configuration>
        </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
        <groupId>com.microsoft.cognitiveservices.speech</groupId>
        <artifactId>client-sdk</artifactId>
        <version>1.37.0</version>
        </dependency>
    </dependencies>
</project>

若要安装语音 SDK 和依赖项，请运行以下 Maven 命令。
```
mvn clean dependency:copy-dependencies
```

创建 Eclipse 项目并安装语音 SDK

安装 Eclipse Java IDE。此 IDE 需要已安装 Java。
启动 Eclipse。
在 Eclipse Launcher 中，在“工作区”框中输入某个新工作区目录的名称。然后选择“启动”。
片刻之后，Eclipse IDE 的主窗口将会显示。关闭欢迎屏幕（如果存在）。
从 Eclipse 菜单中，选择“文件”>“新建”>“项目”。
将显示“新建项目”对话框。选择“Java 项目”，然后选择“下一步”。
此时将启动“新建 Java 项目”向导。在“项目名称”字段中，输入“快速入门”。选择“JavaSE-1.8”作为执行环境。选择“完成”。
如果出现了“打开关联的透视图?”窗口，请选择“打开透视图”。
在“包资源管理器”中，右键单击“快速入门”项目。从上下文菜单中选择“配置”>“转换为 Maven 项目”。
此时将显示“新建 POM”窗口。在“组 ID”字段中，输入“com.microsoft.cognitiveservices.speech.samples”。在“项目 ID”字段中，输入“快速入门”。然后选择“完成”。

打开 pom.xml 文件并对其进行编辑：

在文件末尾，在结束标记 </project> 前面添加一个 dependencies 元素，并将语音 SDK 作为依赖项：

<dependencies>
  <dependency>
    <groupId>com.microsoft.cognitiveservices.speech</groupId>
    <artifactId>client-sdk</artifactId>
    <version>1.37.0</version>
  </dependency>
</dependencies>

保存更改。

Gradle 配置

Gradle 配置需要显式引用 .jar 依赖项扩展：

// build.gradle

dependencies {
    implementation group: 'com.microsoft.cognitiveservices.speech', name: 'client-sdk', version: "1.37.0", ext: "jar"
}

从一些样本代码入手

从 src dir 打开 Main.java。
将文件的内容替换为以下内容：

import java.util.ArrayList;
import java.util.Dictionary;
import java.util.concurrent.ExecutionException;


import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.intent.*;

public class Main {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        IntentPatternMatchingWithMicrophone();
    }

    public static void IntentPatternMatchingWithMicrophone() throws InterruptedException, ExecutionException {
        SpeechConfig config = SpeechConfig.fromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
    }
}

创建语音配置

在初始化 IntentRecognizer 对象之前，需要创建一个配置来使用 Azure AI 服务预测资源的密钥和 Azure 区域。

将 "YOUR_SUBSCRIPTION_KEY" 替换为 Azure AI 服务预测密钥。
将 "YOUR_SUBSCRIPTION_REGION" 替换为 Azure AI 服务资源区域。

此示例使用 fromSubscription() 方法来生成 SpeechConfig。有关可用方法的完整列表，请参阅 SpeechConfig 类。

初始化 IntentRecognizer

现在创建一个 IntentRecognizer。将此代码插入语音配置下。我们尝试执行此操作，以便利用可自动封闭的接口。

try (IntentRecognizer recognizer = new IntentRecognizer(config)) {

}

添加一些意向

需要将一些模式与 PatternMatchingModel 关联并将其应用于 IntentRecognizer。首先创建一个 PatternMatchingModel 并向其添加一些意向。

注意

可以向一个 PatternMatchingIntent 添加多个模式。

在 try 块内插入此代码：

// Creates a Pattern Matching model and adds specific intents from your model. The
// Id is used to identify this model from others in the collection.
PatternMatchingModel model = new PatternMatchingModel("YourPatternMatchingModelId");

// Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
String patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

// Creates a pattern that uses an optional entity and group that could be used to tie commands together.
String patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

// You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
// to distinguish between the instances. For example:
String patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
// NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
//       and is separated from the entity name by a ':'

// Creates the pattern matching intents and adds them to the model
model.getIntents().put(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
model.getIntents().put(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

添加一些自定义实体

若要充分利用模式匹配程序，可以自定义实体。列出“floorName”的可用楼层。我们还使“parkingLevel”成为整数实体。

将下面的代码插入到你的意向下方：

// Creates the "floorName" entity and set it to type list.
// Adds acceptable values. NOTE the default entity type is Any and so we do not need
// to declare the "action" entity.
model.getEntities().put(PatternMatchingEntity.CreateListEntity("floorName", PatternMatchingEntity.EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

// Creates the "parkingLevel" entity as a pre-built integer
model.getEntities().put(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

将模型应用于识别器

现在，需要将模型应用于 IntentRecognizer。可以一次使用多个模型，以便 API 采用模型集合。

将下面的代码插入到你的实体下方：

ArrayList<LanguageUnderstandingModel> modelCollection = new ArrayList<LanguageUnderstandingModel>();
modelCollection.add(model);

recognizer.applyLanguageModels(modelCollection);

识别意向

在 IntentRecognizer 对象中，我们将调用 RecognizeOnceAsync() 方法。此方法要求语音服务识别单个短语中的语音，并在识别到短语后停止识别语音。

应用语言模型后插入以下代码：

System.out.println("Say something...");

IntentRecognitionResult result = recognizer.recognizeOnceAsync().get();

显示识别结果（或错误）

当语音服务返回识别结果后，我们将输出结果。

将此代码插在 IntentRecognitionResult result = recognizer.recognizeOnceAsync.get(); 下：

if (result.getReason() == ResultReason.RecognizedSpeech) {
    System.out.println("RECOGNIZED: Text= " + result.getText());
    System.out.println(String.format("%17s", "Intent not recognized."));
}
else if (result.getReason() == ResultReason.RecognizedIntent)
{
    System.out.println("RECOGNIZED: Text= " + result.getText());
    System.out.println(String.format("%17s %s", "Intent Id=", result.getIntentId() + "."));
    Dictionary<String, String> entities = result.getEntities();

    switch (result.getIntentId())
    {
        case "ChangeFloors":
            if (entities.get("floorName") != null) {
                System.out.println(String.format("%17s %s", "FloorName=", entities.get("floorName")));
            }
            if (entities.get("floorName:1") != null) {
                System.out.println(String.format("%17s %s", "FloorName:1=", entities.get("floorName:1")));
            }
            if (entities.get("floorName:2") != null) {
                System.out.println(String.format("%17s %s", "FloorName:2=", entities.get("floorName:2")));
            }
            if (entities.get("parkingLevel") != null) {
                System.out.println(String.format("%17s %s", "ParkingLevel=", entities.get("parkingLevel")));
            }
            break;
        case "DoorControl":
            if (entities.get("action") != null) {
                System.out.println(String.format("%17s %s", "Action=", entities.get("action")));
            }
            break;
    }
}
else if (result.getReason() == ResultReason.NoMatch) {
    System.out.println("NOMATCH: Speech could not be recognized.");
}
else if (result.getReason() == ResultReason.Canceled) {
    CancellationDetails cancellation = CancellationDetails.fromResult(result);
    System.out.println("CANCELED: Reason=" + cancellation.getReason());

    if (cancellation.getReason() == CancellationReason.Error)
    {
        System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
        System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
        System.out.println("CANCELED: Did you update the subscription info?");
    }
}

查看代码

此时，代码应如下所示：

package quickstart;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.Dictionary;

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.intent.*;

public class Main {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        IntentPatternMatchingWithMicrophone();
    }

    public static void IntentPatternMatchingWithMicrophone() throws InterruptedException, ExecutionException {
        SpeechConfig config = SpeechConfig.fromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
        try (IntentRecognizer recognizer = new IntentRecognizer(config)) {
            // Creates a Pattern Matching model and adds specific intents from your model. The
            // Id is used to identify this model from others in the collection.
            PatternMatchingModel model = new PatternMatchingModel("YourPatternMatchingModelId");

            // Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
            String patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

            // Creates a pattern that uses an optional entity and group that could be used to tie commands together.
            String patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

            // You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
            // to distinguish between the instances. For example:
            String patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
            // NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
            // and is separated from the entity name by a ':'

            // Creates the pattern matching intents and adds them to the model
            model.getIntents().put(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
            model.getIntents().put(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

            // Creates the "floorName" entity and set it to type list.
            // Adds acceptable values. NOTE the default entity type is Any and so we do not need
            // to declare the "action" entity.
            model.getEntities().put(PatternMatchingEntity.CreateListEntity("floorName", PatternMatchingEntity.EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

            // Creates the "parkingLevel" entity as a pre-built integer
            model.getEntities().put(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

            ArrayList<LanguageUnderstandingModel> modelCollection = new ArrayList<LanguageUnderstandingModel>();
            modelCollection.add(model);

            recognizer.applyLanguageModels(modelCollection);

            System.out.println("Say something...");

            IntentRecognitionResult result = recognizer.recognizeOnceAsync().get();

            if (result.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("RECOGNIZED: Text= " + result.getText());
                System.out.println(String.format("%17s", "Intent not recognized."));
            }
            else if (result.getReason() == ResultReason.RecognizedIntent)
            {
                System.out.println("RECOGNIZED: Text= " + result.getText());
                System.out.println(String.format("%17s %s", "Intent Id=", result.getIntentId() + "."));
                Dictionary<String, String> entities = result.getEntities();

                switch (result.getIntentId())
                {
                    case "ChangeFloors":
                        if (entities.get("floorName") != null) {
                            System.out.println(String.format("%17s %s", "FloorName=", entities.get("floorName")));
                        }
                        if (entities.get("floorName:1") != null) {
                            System.out.println(String.format("%17s %s", "FloorName:1=", entities.get("floorName:1")));
                        }
                        if (entities.get("floorName:2") != null) {
                            System.out.println(String.format("%17s %s", "FloorName:2=", entities.get("floorName:2")));
                        }
                        if (entities.get("parkingLevel") != null) {
                            System.out.println(String.format("%17s %s", "ParkingLevel=", entities.get("parkingLevel")));
                        }
                        break;

                    case "DoorControl":
                        if (entities.get("action") != null) {
                            System.out.println(String.format("%17s %s", "Action=", entities.get("action")));
                        }
                        break;
                }
            }
            else if (result.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (result.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(result);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());

                if (cancellation.getReason() == CancellationReason.Error)
                {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }
            }
        }
    }
}

生成并运行应用

现在，你已准备好使用语音服务和嵌入式模式匹配程序生成应用并测试我们的意向识别。

在 Eclipse 中选择“运行”按钮或按 Ctrl+F11，然后观看输出中的“说些...”提示。一旦出现，它就会说出你的语句并观看输出。

例如，如果你说“Take me to floor 2”，则输出如下所示：

Say something...
RECOGNIZED: Text=Take me to floor 2.
       Intent Id=ChangeFloors.
       FloorName=2

作为另一个示例，如果你说“Take me to floor 7”，则输出如下所示：

Say something...
RECOGNIZED: Text=Take me to floor 7.
    Intent not recognized.

未识别任何意向，因为 floorName 的有效值列表中没有 7。

如何使用自定义实体模式匹配识别意向

何时使用模式匹配

先决条件

创建项目

从一些样本代码入手

创建语音配置

初始化 IntentRecognizer

添加一些意向

添加一些自定义实体

将模型应用于识别器

识别意向

显示识别结果（或错误）

查看代码

生成并运行应用

创建项目

从一些样本代码入手

创建语音配置

初始化 IntentRecognizer

添加一些意向

添加一些自定义实体

将模型应用于识别器

识别意向

显示识别结果（或错误）

查看代码

生成并运行应用

平台要求

安装适用于 Java 的语音 SDK

支持的操作系统

从一些样本代码入手

创建语音配置

初始化 IntentRecognizer

添加一些意向

添加一些自定义实体

将模型应用于识别器

识别意向

显示识别结果（或错误）

查看代码

生成并运行应用

其他资源