了解语音识别的基本知识Learn the basics of speech recognition

语音服务的核心功能之一是能够识别并转录人类语音(通常称为语音转文本)。One of the core features of the Speech service is the ability to recognize and transcribe human speech (often referred to as speech to text). 本文介绍如何在应用和产品中使用语音 SDK 来执行高质量的语音识别。In this article, you'll learn how to use the Speech SDK in your apps and products to perform high-quality speech recognition.

提示

如果你没有机会完成我们的快速入门之一,我们建议你进行尝试,自行试用语音识别。If you haven't had a chance to complete one of our quickstarts, we encourage you to kick the tires and try speech recognition out for yourself.

先决条件Prerequisites

本文假设已有一个 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据所用的平台参考以下说明:Depending on your platform, use the following instructions:

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关订阅的信息,例如密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

备注

无论是要执行语音识别、语音合成、翻译还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechConfigThere are a few ways that you can initialize a SpeechConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechConfigLet's take a look at how a SpeechConfig is created using a key and region. 请查看区域支持页以找到你的区域标识符。See the region support page to find your region identifier.

var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

初始化识别器Initialize a recognizer

创建 SpeechConfig 后,下一步是初始化 SpeechRecognizerAfter you've created a SpeechConfig, the next step is to initialize a SpeechRecognizer. 初始化 SpeechRecognizer 时,需要向其传递 speechConfigWhen you initialize a SpeechRecognizer, you'll need to pass it your speechConfig. 这会提供语音服务验证请求所需的凭据。This provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 SpeechRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the SpeechRecognizer should look like:

using var recognizer = new SpeechRecognizer(speechConfig);

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 SpeechRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your SpeechRecognizer.

首先,添加以下 using 语句。First, add the following using statement.

using Microsoft.CognitiveServices.Speech.Audio;

接下来可以引用 AudioConfig 对象,如下所示:Next, you'll be able to reference the AudioConfig object as follows:

using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,在创建 AudioConfig(而不是调用 FromDefaultMicrophoneInput)时,将调用 FromWavFileOutput 并传递 filename 参数。However, when you create an AudioConfig, instead of calling FromDefaultMicrophoneInput, you'll call FromWavFileOutput and pass the filename parameter.

using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

识别语音Recognize speech

用于 C# 的语音 SDK 的识别器类公开了一些可用于语音识别的方法。The Recognizer class for the Speech SDK for C# exposes a few methods that you can use for speech recognition.

  • 单步识别(异步)- 在非阻塞(异步)模式下执行识别。Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. 这会识别单个言语。This will recognize a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
  • 连续识别(异步)- 异步启动连续识别操作。Continuous recognition (async) - Asynchronously initiates continuous recognition operation. 用户向事件注册并处理各种应用程序状态。The user registers to events and handles various application state. 若要停止异步连续识别,请调用 StopContinuousRecognitionAsyncTo stop asynchronous continuous recognition, call StopContinuousRecognitionAsync.

备注

详细了解如何选择语音识别模式Learn more about how to choose a speech recognition mode.

单步识别Single-shot recognition

下面是使用 RecognizeOnceAsync 进行异步单步识别的示例:Here's an example of asynchronous single-shot recognition using RecognizeOnceAsync:

var result = await recognizer.RecognizeOnceAsync();

需要编写一些代码来处理结果。You'll need to write some code to handle the result. 此示例计算 result.ReasonThis sample evaluates the result.Reason:

  • 输出识别结果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.Reason)
{
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        Console.WriteLine($"    Intent not recognized.");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
        break;
}

连续识别Continuous recognition

连续识别涉及的方面比单步识别多一点。Continuous recognition is a bit more involved than single-shot recognition. 它要求你订阅 RecognizingRecognizedCanceled 事件以获取识别结果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止识别,必须调用 StopContinuousRecognitionAsyncTo stop recognition, you must call StopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

接下来,创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们在前面的声明后声明一个 TaskCompletionSource<int>To start, we'll declare a TaskCompletionSource<int> after the previous declarations.

var stopRecognition = new TaskCompletionSource<int>();

我们将订阅从 SpeechRecognizer 发送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • Recognizing:事件信号,包含中间识别结果。Recognizing: Signal for events containing intermediate recognition results.
  • Recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。Recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • SessionStopped:事件信号,指示识别会话的结束(操作)。SessionStopped: Signal for events indicating the end of a recognition session (operation).
  • Canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。Canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.Recognizing += (s, e) =>
{
    Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
};

recognizer.Recognized += (s, e) =>
{
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
    }
    else if (e.Result.Reason == ResultReason.NoMatch)
    {
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
    }
};

recognizer.Canceled += (s, e) =>
{
    Console.WriteLine($"CANCELED: Reason={e.Reason}");

    if (e.Reason == CancellationReason.Error)
    {
        Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
        Console.WriteLine($"CANCELED: Did you update the subscription info?");
    }

    stopRecognition.TrySetResult(0);
};

recognizer.SessionStopped += (s, e) =>
{
    Console.WriteLine("\n    Session stopped event.");
    stopRecognition.TrySetResult(0);
};

完成所有设置后,可以调用 StopContinuousRecognitionAsyncWith everything set up, we can call StopContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
await recognizer.StartContinuousRecognitionAsync();

// Waits for completion. Use Task.WaitAny to keep the task rooted.
Task.WaitAny(new[] { stopRecognition.Task });

// Stops recognition.
await recognizer.StopContinuousRecognitionAsync();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式会促使语音配置实例解释对句子结构(如标点符号)进行的字面描述。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请对 SpeechConfig 使用 EnableDictation 方法。To enable dictation mode, use the EnableDictation method on your SpeechConfig.

speechConfig.EnableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为韩语。Let's take a look at how you would change the input language to Korean. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

speechConfig.SpeechRecognitionLanguage = "ko-KR";

SpeechRecognitionLanguage 属性需要语言区域设置格式字符串。The SpeechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确度。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表特征仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 AddPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

var phraseList = PhraseListGrammar.FromRecognizer(recognizer);
phraseList.AddPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseList.Clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 也可执行以下操作:You can also:

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechConfigThere are a few ways that you can initialize a SpeechConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechConfigLet's take a look at how a SpeechConfig is created using a key and region. 请查看区域支持页以找到你的区域标识符。See the region support page to find your region identifier.

auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");

初始化识别器Initialize a recognizer

创建 SpeechConfig 后,下一步是初始化 SpeechRecognizerAfter you've created a SpeechConfig, the next step is to initialize a SpeechRecognizer. 初始化 SpeechRecognizer 时,需要向其传递 speech_configWhen you initialize a SpeechRecognizer, you'll need to pass it your speech_config. 这会提供语音服务验证请求所需的凭据。This provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 SpeechRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the SpeechRecognizer should look like:

auto recognizer = SpeechRecognizer::FromConfig(config);

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 SpeechRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your SpeechRecognizer.

首先,在 #include 定义后添加以下 using namespace 语句。First, add the following using namespace statement after your #include definitions.

using namespace Microsoft::CognitiveServices::Speech::Audio;

接下来,你将能够引用 AudioConfig 对象,如下所示:Next, you'll be able to reference the AudioConfig object as follows:

auto audioConfig = AudioConfig::FromDefaultMicrophoneInput();
auto recognizer = SpeechRecognizer::FromConfig(config, audioConfig);

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,当你创建 AudioConfig(而不是调用 FromDefaultMicrophoneInput)时,你将调用 FromWavFileOutput 并传递 filename 参数。However, when you create an AudioConfig, instead of calling FromDefaultMicrophoneInput, you'll call FromWavFileOutput and pass the filename parameter.

auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

识别语音Recognize speech

用于 C++ 的语音 SDK 的识别器类公开了一些可用于语音识别的方法。The Recognizer class for the Speech SDK for C++ exposes a few methods that you can use for speech recognition.

  • 单步识别(异步)- 在非阻塞(异步)模式下执行识别。Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. 这将识别单个言语。This will recognize a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
  • 连续识别(异步)- 异步启动连续识别操作。Continuous recognition (async) - Asynchronously initiates continuous recognition operation. 用户必须连接到处理事件才能接收识别结果。The user has to connect to handle event to receive recognition results. 若要停止异步连续识别,请调用 StopContinuousRecognitionAsyncTo stop asynchronous continuous recognition, call StopContinuousRecognitionAsync.

备注

详细了解如何选择语音识别模式Learn more about how to choose a speech recognition mode.

单步识别Single-shot recognition

下面是使用 RecognizeOnceAsync 进行异步单步识别的示例:Here's an example of asynchronous single-shot recognition using RecognizeOnceAsync:

auto result = recognizer->RecognizeOnceAsync().get();

需要编写一些代码来处理结果。You'll need to write some code to handle the result. 此示例计算 result->ReasonThis sample evaluates the result->Reason:

  • 输出识别结果:ResultReason::RecognizedSpeechPrints the recognition result: ResultReason::RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason::NoMatchIf there is no recognition match, inform the user: ResultReason::NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason::CanceledIf an error is encountered, print the error message: ResultReason::Canceled
switch (result->Reason)
{
    case ResultReason::RecognizedSpeech:
        cout << "We recognized: " << result->Text << std::endl;
        break;
    case ResultReason::NoMatch:
        cout << "NOMATCH: Speech could not be recognized." << std::endl;
        break;
    case ResultReason::Canceled:
        {
            auto cancellation = CancellationDetails::FromResult(result);
            cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
            if (cancellation->Reason == CancellationReason::Error) {
                cout << "CANCELED: ErrorCode= " << (int)cancellation->ErrorCode << std::endl;
                cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                cout << "CANCELED: Did you update the subscription info?" << std::endl;
            }
        }
        break;
    default:
        break;
}

连续识别Continuous recognition

连续识别涉及的方面比单步识别多一点。Continuous recognition is a bit more involved than single-shot recognition. 它要求你订阅 RecognizingRecognizedCanceled 事件以获取识别结果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止识别,必须调用 StopContinuousRecognitionAsyncTo stop recognition, you must call StopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

接下来,让我们创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们将声明 promise<void>,因为在开始识别时,我们可以放心地假定该操作尚未完成。To start, we'll declare a promise<void>, since at the start of recognition we can safely assume that it's not finished.

promise<void> recognitionEnd;

我们将订阅从 SpeechRecognizer 发送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • Recognizing:事件信号,包含中间识别结果。Recognizing: Signal for events containing intermediate recognition results.
  • Recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。Recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • SessionStopped:事件信号,指示识别会话的结束(操作)。SessionStopped: Signal for events indicating the end of a recognition session (operation).
  • Canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。Canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
    {
        cout << "Recognizing:" << e.Result->Text << std::endl;
    });

recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
    {
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED: Text=" << e.Result->Text 
                 << " (text could not be translated)" << std::endl;
        }
        else if (e.Result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Speech could not be recognized." << std::endl;
        }
    });

recognizer->Canceled.Connect([&recognitionEnd](const SpeechRecognitionCanceledEventArgs& e)
    {
        cout << "CANCELED: Reason=" << (int)e.Reason << std::endl;
        if (e.Reason == CancellationReason::Error)
        {
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << "\n"
                 << "CANCELED: ErrorDetails=" << e.ErrorDetails << "\n"
                 << "CANCELED: Did you update the subscription info?" << std::endl;

            recognitionEnd.set_value(); // Notify to stop recognition.
        }
    });

recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
    {
        cout << "Session stopped.";
        recognitionEnd.set_value(); // Notify to stop recognition.
    });

完成所有设置后,可以调用 StopContinuousRecognitionAsyncWith everything set up, we can call StopContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer->StartContinuousRecognitionAsync().get();

// Waits for recognition end.
recognitionEnd.get_future().get();

// Stops recognition.
recognizer->StopContinuousRecognitionAsync().get();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 EnableDictation 方法。To enable dictation mode, use the EnableDictation method on your SpeechConfig.

config->EnableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为德语。Let's take a look at how you would change the input language to German. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

config->SetSpeechRecognitionLanguage("de-DE");

SetSpeechRecognitionLanguage 是采用字符串作为实参的形参。SetSpeechRecognitionLanguage is a parameter that takes a string as an argument. 可以提供受支持的区域设置/语言的列表中的任何值。You can provide any value in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确性。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 AddPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

auto phraseListGrammar = PhraseListGrammar::FromRecognizer(recognizer);
phraseListGrammar->AddPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseListGrammar->Clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 你还可以:You can also:

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK. 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechConfigThere are a few ways that you can initialize a SpeechConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechConfigLet's take a look at how a SpeechConfig is created using a key and region. 请查看区域支持页以找到你的区域标识符。See the region support page to find your region identifier.

SpeechConfig config = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

初始化识别器Initialize a recognizer

创建 SpeechConfig 后,下一步是初始化 SpeechRecognizerAfter you've created a SpeechConfig, the next step is to initialize a SpeechRecognizer. 初始化 SpeechRecognizer 时,需要向其传递 configWhen you initialize a SpeechRecognizer, you'll need to pass it your config. 这会提供语音服务验证请求所需的凭据。This provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 SpeechRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the SpeechRecognizer should look like:

SpeechRecognizer recognizer = new SpeechRecognizer(config);

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 SpeechRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your SpeechRecognizer.

首先,添加以下 import 语句。First, add the following import statements.

import java.util.concurrent.Future;
import com.microsoft.cognitiveservices.speech.*;

接下来,你将能够引用 AudioConfig 对象,如下所示:Next, you'll be able to reference the AudioConfig object as follows:

AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,当你创建 AudioConfig(而不是调用 fromDefaultMicrophoneInput)时,你将调用 fromWavFileOutput 并传递 filename 参数。However, when you create an AudioConfig, instead of calling fromDefaultMicrophoneInput, you'll call fromWavFileOutput and pass the filename parameter.

AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);

识别语音Recognize speech

用于 Java 的语音 SDK 的识别器类公开了一些可用于语音识别的方法。The Recognizer class for the Speech SDK for Java exposes a few methods that you can use for speech recognition.

  • 单步识别(异步)- 在非阻塞(异步)模式下执行识别。Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. 这将识别单个言语。This will recognize a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
  • 连续识别(异步)- 异步启动连续识别操作。Continuous recognition (async) - Asynchronously initiates continuous recognition operation. 如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 若要停止异步连续识别,请调用 stopContinuousRecognitionAsyncTo stop asynchronous continuous recognition, call stopContinuousRecognitionAsync.

备注

详细了解如何选择语音识别模式Learn more about how to choose a speech recognition mode.

单步识别Single-shot recognition

下面是使用 recognizeOnceAsync 进行异步单步识别的示例:Here's an example of asynchronous single-shot recognition using recognizeOnceAsync:

Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
SpeechRecognitionResult result = task.get();

需要编写一些代码来处理结果。You'll need to write some code to handle the result. 此示例计算 result.getReason()This sample evaluates the result.getReason():

  • 输出识别结果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.getReason()) {
    case ResultReason.RecognizedSpeech:
        System.out.println("We recognized: " + result.getText());
        exitCode = 0;
        break;
    case ResultReason.NoMatch:
        System.out.println("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled: {
            CancellationDetails cancellation = CancellationDetails.fromResult(result);
            System.out.println("CANCELED: Reason=" + cancellation.getReason());

            if (cancellation.getReason() == CancellationReason.Error) {
                System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                System.out.println("CANCELED: Did you update the subscription info?");
            }
        }
        break;
}

连续识别Continuous recognition

连续识别涉及的方面比单步识别多一点。Continuous recognition is a bit more involved than single-shot recognition. 它要求你订阅 recognizingrecognizedcanceled 事件以获取识别结果。It requires you to subscribe to the recognizing, recognized, and canceled events to get the recognition results. 若要停止识别,必须调用 stopContinuousRecognitionAsyncTo stop recognition, you must call stopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);

接下来,让我们创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们将在类范围内声明一个 SemaphoreTo start, we'll declare a Semaphore at the class scope.

private static Semaphore stopTranslationWithFileSemaphore;

我们将订阅从 SpeechRecognizer 发送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • recognizing:事件信号,包含中间识别结果。recognizing: Signal for events containing intermediate recognition results.
  • recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • sessionStopped:事件信号,指示识别会话的结束(操作)。sessionStopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
// First initialize the semaphore.
stopTranslationWithFileSemaphore = new Semaphore(0);

recognizer.recognizing.addEventListener((s, e) -> {
    System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
});

recognizer.recognized.addEventListener((s, e) -> {
    if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
        System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
    }
    else if (e.getResult().getReason() == ResultReason.NoMatch) {
        System.out.println("NOMATCH: Speech could not be recognized.");
    }
});

recognizer.canceled.addEventListener((s, e) -> {
    System.out.println("CANCELED: Reason=" + e.getReason());

    if (e.getReason() == CancellationReason.Error) {
        System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
        System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
        System.out.println("CANCELED: Did you update the subscription info?");
    }

    stopTranslationWithFileSemaphore.release();
});

recognizer.sessionStopped.addEventListener((s, e) -> {
    System.out.println("\n    Session stopped event.");
    stopTranslationWithFileSemaphore.release();
});

完成所有设置后,可以调用 stopContinuousRecognitionAsyncWith everything set up, we can call stopContinuousRecognitionAsync.

// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
recognizer.startContinuousRecognitionAsync().get();

// Waits for completion.
stopTranslationWithFileSemaphore.acquire();

// Stops recognition.
recognizer.stopContinuousRecognitionAsync().get();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 enableDictation 方法。To enable dictation mode, use the enableDictation method on your SpeechConfig.

config.enableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为法语。Let's take a look at how you would change the input language to French. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

config.setSpeechRecognitionLanguage("fr-FR");

setSpeechRecognitionLanguage 是采用字符串作为实参的形参。setSpeechRecognitionLanguage is a parameter that takes a string as an argument. 可以提供受支持的区域设置/语言的列表中的任何值。You can provide any value in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确性。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 AddPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with AddPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

PhraseListGrammar phraseList = PhraseListGrammar.fromRecognizer(recognizer);
phraseList.addPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseList.clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 你还可以:You can also:

先决条件Prerequisites

本文假定你有 Azure 帐户和语音服务订阅。This article assumes that you have an Azure account and Speech service subscription. 如果你没有帐户和订阅,可以免费试用语音服务If you don't have an account and subscription, try the Speech service for free.

安装语音 SDKInstall the Speech SDK

需要先安装 JavaScript 语音 SDK ,然后才能执行操作。Before you can do anything, you'll need to install the JavaScript Speech SDK . 根据你的平台,使用以下说明:Depending on your platform, use the following instructions:

另外,请根据目标环境使用以下项之一:Additionally, depending on the target environment use one of the following:

import {
    AudioConfig,
    CancellationDetails,
    CancellationReason,
    PhraseListGrammar,
    ResultReason,
    SpeechConfig,
    SpeechRecognizer
} from "microsoft-cognitiveservices-speech-sdk";

有关 import 的详细信息,请参阅 export 和 import For more information on import, see export and import .

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechConfigThere are a few ways that you can initialize a SpeechConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechConfigLet's take a look at how a SpeechConfig is created using a key and region. 请查看区域支持页以找到你的区域标识符。See the region support page to find your region identifier.

const speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

初始化识别器Initialize a recognizer

创建 SpeechConfig 后,下一步是初始化 SpeechRecognizerAfter you've created a SpeechConfig, the next step is to initialize a SpeechRecognizer. 初始化 SpeechRecognizer 时,需要向其传递 speechConfigWhen you initialize a SpeechRecognizer, you'll need to pass it your speechConfig. 这会提供语音服务验证请求所需的凭据。This provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 SpeechRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the SpeechRecognizer should look like:

const recognizer = new SpeechRecognizer(speechConfig);

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 SpeechRecognizer 时提供 audioConfig 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audioConfig parameter when initializing your SpeechRecognizer.

引用 AudioConfig 对象,如下所示:Reference the AudioConfig object as follows:

const audioConfig = AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new SpeechRecognizer(speechConfig, audioConfig);

如果要提供音频文件而不是使用麦克风,则仍需要提供 audioConfigIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audioConfig. 但是,只有在以 Node.js 为目标时才能这样做。创建 AudioConfig 时,需调用 fromWavFileOutput 并传递 filename 参数,而不是调用 fromDefaultMicrophoneInputHowever, this can only be done when targeting Node.js and when you create an AudioConfig, instead of calling fromDefaultMicrophoneInput, you'll call fromWavFileOutput and pass the filename parameter.

const audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
const recognizer = new SpeechRecognizer(speechConfig, audioConfig);

识别语音Recognize speech

用于 C# 的语音 SDK 的识别器类公开了一些可用于语音识别的方法。The Recognizer class for the Speech SDK for C# exposes a few methods that you can use for speech recognition.

  • 单步识别(异步)- 在非阻塞(异步)模式下执行识别。Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. 这将识别单个言语。This will recognize a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
  • 连续识别(异步)- 异步启动连续识别操作。Continuous recognition (async) - Asynchronously initiates continuous recognition operation. 用户向事件注册并处理各种应用程序状态。The user registers to events and handles various application state. 若要停止异步连续识别,请调用 stopContinuousRecognitionAsyncTo stop asynchronous continuous recognition, call stopContinuousRecognitionAsync.

备注

详细了解如何选择语音识别模式Learn more about how to choose a speech recognition mode.

单步识别Single-shot recognition

下面是使用 recognizeOnceAsync 进行异步单步识别的示例:Here's an example of asynchronous single-shot recognition using recognizeOnceAsync:

recognizer.recognizeOnceAsync(result => {
    // Interact with result
});

需要编写一些代码来处理结果。You'll need to write some code to handle the result. 此示例计算 result.reasonThis sample evaluates the result.reason:

  • 输出识别结果:ResultReason.RecognizedSpeechPrints the recognition result: ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:ResultReason.NoMatchIf there is no recognition match, inform the user: ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:ResultReason.CanceledIf an error is encountered, print the error message: ResultReason.Canceled
switch (result.reason) {
    case ResultReason.RecognizedSpeech:
        console.log(`RECOGNIZED: Text=${result.text}`);
        console.log("    Intent not recognized.");
        break;
    case ResultReason.NoMatch:
        console.log("NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        const cancellation = CancellationDetails.fromResult(result);
        console.log(`CANCELED: Reason=${cancellation.reason}`);

        if (cancellation.reason == CancellationReason.Error) {
            console.log(`CANCELED: ErrorCode=${cancellation.ErrorCode}`);
            console.log(`CANCELED: ErrorDetails=${cancellation.errorDetails}`);
            console.log("CANCELED: Did you update the subscription info?");
        }
        break;
    }
}

连续识别Continuous recognition

连续识别涉及的方面比单步识别多一点。Continuous recognition is a bit more involved than single-shot recognition. 它要求你订阅 RecognizingRecognizedCanceled 事件以获取识别结果。It requires you to subscribe to the Recognizing, Recognized, and Canceled events to get the recognition results. 若要停止识别,必须调用 stopContinuousRecognitionAsyncTo stop recognition, you must call stopContinuousRecognitionAsync. 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

const recognizer = new SpeechRecognizer(speechConfig);

我们将订阅从 SpeechRecognizer 发送的事件。We'll subscribe to the events sent from the SpeechRecognizer.

  • recognizing:事件信号,包含中间识别结果。recognizing: Signal for events containing intermediate recognition results.
  • recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • sessionStopped:事件信号,指示识别会话的结束(操作)。sessionStopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
recognizer.recognizing = (s, e) => {
    console.log(`RECOGNIZING: Text=${e.result.text}`);
};

recognizer.recognized = (s, e) => {
    if (e.result.reason == ResultReason.RecognizedSpeech) {
        console.log(`RECOGNIZED: Text=${e.result.text}`);
    }
    else if (e.result.reason == ResultReason.NoMatch) {
        console.log("NOMATCH: Speech could not be recognized.");
    }
};

recognizer.canceled = (s, e) => {
    console.log(`CANCELED: Reason=${e.reason}`);

    if (e.reason == CancellationReason.Error) {
        console.log(`"CANCELED: ErrorCode=${e.errorCode}`);
        console.log(`"CANCELED: ErrorDetails=${e.errorDetails}`);
        console.log("CANCELED: Did you update the subscription info?");
    }

    recognizer.stopContinuousRecognitionAsync();
};

recognizer.sessionStopped = (s, e) => {
    console.log("\n    Session stopped event.");
    recognizer.stopContinuousRecognitionAsync();
};

完成所有设置后,可以调用 stopContinuousRecognitionAsyncWith everything set up, we can call stopContinuousRecognitionAsync.

// Starts continuous recognition. Uses stopContinuousRecognitionAsync() to stop recognition.
recognizer.startContinuousRecognitionAsync();

// Something later can call, stops recognition.
// recognizer.StopContinuousRecognitionAsync();

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 enableDictation 方法。To enable dictation mode, use the enableDictation method on your SpeechConfig.

speechConfig.enableDictation();

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为意大利语。Let's take a look at how you would change the input language to Italian. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

speechConfig.speechRecognitionLanguage = "it-IT";

speechRecognitionLanguage 属性需要语言区域设置格式字符串。The speechRecognitionLanguage property expects a language-locale format string. 可以提供受支持的区域设置/语言的列表中“区域设置”列中的任何值 。You can provide any value in the Locale column in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式提高语音的识别准确性。让我们看看短语列表。There are a few ways to improve recognition accuracy with the Speech Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 addPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with addPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

const phraseList = PhraseListGrammar.fromRecognizer(recognizer);
phraseList.addPhrase("Supercalifragilisticexpialidocious");

如果需要清除短语列表:If you need to clear your phrase list:

phraseList.clear();

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 你还可以:You can also:

先决条件Prerequisites

本文假设:This article assumes:

安装和导入语音 SDKInstall and import the Speech SDK

你需要先安装语音 SDK,然后才能执行任何操作。Before you can do anything, you'll need to install the Speech SDK.

pip install azure-cognitiveservices-speech

如果使用的是 macOS 且你遇到安装问题,则可能需要先运行此命令。If you're on macOS and run into install issues, you may need to run this command first.

python3 -m pip install --upgrade pip

安装语音 SDK 后,将其导入到 Python 项目中,其中包含此语句。After the Speech SDK is installed, import it into your Python project with this statement.

import azure.cognitiveservices.speech as speechsdk

创建语音配置Create a speech configuration

若要使用语音 SDK 调用语音服务,需要创建 SpeechConfigTo call the Speech service using the Speech SDK, you need to create a SpeechConfig. 此类包含有关你的订阅的信息,例如你的密钥和关联的区域、终结点、主机或授权令牌。This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.

备注

无论你是要执行语音识别、语音合成、翻译,还是意向识别,都需要创建一个配置。Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.

可以通过以下几种方法初始化 SpeechConfigThere are a few ways that you can initialize a SpeechConfig:

  • 使用订阅:传入密钥和关联的区域。With a subscription: pass in a key and the associated region.
  • 使用终结点:传入语音服务终结点。With an endpoint: pass in a Speech service endpoint. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用主机:传入主机地址。With a host: pass in a host address. 密钥或授权令牌是可选的。A key or authorization token is optional.
  • 使用授权令牌:传入授权令牌和关联的区域。With an authorization token: pass in an authorization token and the associated region.

让我们看看如何使用密钥和区域创建 SpeechConfigLet's take a look at how a SpeechConfig is created using a key and region. 请查看区域支持页以找到你的区域标识符。See the region support page to find your region identifier.

speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

初始化识别器Initialize a recognizer

创建 SpeechConfig 后,下一步是初始化 SpeechRecognizerAfter you've created a SpeechConfig, the next step is to initialize a SpeechRecognizer. 初始化 SpeechRecognizer 时,需要向其传递 speech_configWhen you initialize a SpeechRecognizer, you'll need to pass it your speech_config. 这会提供语音服务验证请求所需的凭据。This provides the credentials that the speech service requires to validate your request.

如果使用设备的默认麦克风识别语音,则 SpeechRecognizer 应如下所示:If you're recognizing speech using your device's default microphone, here's what the SpeechRecognizer should look like:

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

如果要指定音频输入设备,则需要创建一个 AudioConfig 并在初始化 SpeechRecognizer 时提供 audio_config 参数。If you want to specify the audio input device, then you'll need to create an AudioConfig and provide the audio_config parameter when initializing your SpeechRecognizer.

audio_config = AudioConfig(device_name="<device id>");
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

如果要提供音频文件而不是使用麦克风,则仍需要提供 audio_configIf you want to provide an audio file instead of using a microphone, you'll still need to provide an audio_config. 但是,在创建 AudioConfig 时,你需要使用 filename 参数,而不是提供 device_nameHowever, when you create an AudioConfig, instead of providing the device_name, you'll use the filename parameter.

audio_filename = "whatstheweatherlike.wav"
audio_input = speechsdk.AudioConfig(filename=audio_filename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)

识别语音Recognize speech

用于 Python 的语音 SDK 的识别器类公开了一些可用于语音识别的方法。The Recognizer class for the Speech SDK for Python exposes a few methods that you can use for speech recognition.

  • 单步识别(同步)- 在阻塞(同步)模式下执行识别。Single-shot recognition (sync) - Performs recognition in a blocking (synchronous) mode. 在识别单个言语后返回。Returns after a single utterance is recognized. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed. 该任务返回作为结果的识别文本。The task returns the recognition text as result.
  • 单步识别(异步)- 在非阻塞(异步)模式下执行识别。Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. 这将识别单个言语。This will recognize a single utterance. 单个言语的结束是通过在结束时倾听静音或处理最长 15 秒音频时确定的。The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
  • 连续识别(同步)- 同步启动连续识别。Continuous recognition (sync) - Synchronously initiates continuous recognition. 客户端必须连接到 EventSignal 才能接收识别结果。The client must connect to EventSignal to receive recognition results. 若要停止识别,请调用 stop_continuous_recognition()To stop recognition, call stop_continuous_recognition().
  • 连续识别(异步)- 异步启动连续识别操作。Continuous recognition (async) - Asynchronously initiates continuous recognition operation. 用户必须连接到 EventSignal 才能接收识别结果。User has to connect to EventSignal to receive recognition results. 若要停止异步连续识别,请调用 stop_continuous_recognition()To stop asynchronous continuous recognition, call stop_continuous_recognition().

备注

详细了解如何选择语音识别模式Learn more about how to choose a speech recognition mode.

单步识别Single-shot recognition

下面是使用 recognize_once() 进行同步单步识别的示例:Here's an example of synchronous single-shot recognition using recognize_once():

result = speech_recognizer.recognize_once()

下面是使用 recognize_once_async() 进行异步单步识别的示例:Here's an example of asynchronous single-shot recognition using recognize_once_async():

result = speech_recognizer.recognize_once_async()

无论你使用的是同步方法还是异步方法,都需要编写一些代码来循环访问结果。Regardless of whether you've used the synchronous or asynchronous method, you'll need to write some code to iterate through the result. 此示例将计算 result.reasonThis sample evaluates the result.reason:

  • 输出识别结果:speechsdk.ResultReason.RecognizedSpeechPrints the recognition result: speechsdk.ResultReason.RecognizedSpeech
  • 如果没有识别匹配项,请通知用户:speechsdk.ResultReason.NoMatch If there is no recognition match, inform the user: speechsdk.ResultReason.NoMatch
  • 如果遇到错误,则输出错误消息:speechsdk.ResultReason.CanceledIf an error is encountered, print the error message: speechsdk.ResultReason.Canceled
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

连续识别Continuous recognition

连续识别涉及的方面比单步识别多一点。Continuous recognition is a bit more involved than single-shot recognition. 它要求你连接到 EventSignal 以获取识别结果,必须调用 stop_continuous_recognition()stop_continuous_recognition()It requires you to connect to the EventSignal to get the recognition results, and in to stop recognition, you must call stop_continuous_recognition() or stop_continuous_recognition(). 下面是有关如何对音频输入文件执行连续识别的示例。Here's an example of how continuous recognition is performed on an audio input file.

首先,我们将定义输入并初始化一个 SpeechRecognizerLet's start by defining the input and initializing a SpeechRecognizer:

audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

接下来,让我们创建一个变量来管理语音识别的状态。Next, let's create a variable to manage the state of speech recognition. 首先,我们将此设置为“False”,因为在开始识别时,我们可以放心地假定该操作尚未完成。To start, we'll set this to False, since at the start of recognition we can safely assume that it's not finished.

done = False

现在,我们将创建一个回叫,以在接收到 evt 时停止连续识别。Now, we're going to create a callback to stop continuous recognition when an evt is received. 需谨记以下几点。There's a few things to keep in mind.

  • 接收到 evt 时,系统将输出 evt 消息。When an evt is received, the evt message is printed.
  • 接收到 evt 后,系统将调用 stop_continuous_recognition() 来停止识别。After an evt is received, stop_continuous_recognition() is called to stop recognition.
  • 识别状态将更改为 TrueThe recognition state is changed to True.
def stop_cb(evt):
    print('CLOSING on {}'.format(evt))
    speech_recognizer.stop_continuous_recognition()
    nonlocal done
    done = True

此代码示例演示如何将回叫连接到从 SpeechRecognizer 发送的事件。This code sample shows how to connect callbacks to events sent from the SpeechRecognizer.

  • recognizing:事件信号,包含中间识别结果。recognizing: Signal for events containing intermediate recognition results.
  • recognized:事件信号,包含最终识别结果(指示成功的识别尝试)。recognized: Signal for events containing final recognition results (indicating a successful recognition attempt).
  • session_started:事件信号,指示识别会话的开始(操作)。session_started: Signal for events indicating the start of a recognition session (operation).
  • session_stopped:事件信号,指示识别会话的结束(操作)。session_stopped: Signal for events indicating the end of a recognition session (operation).
  • canceled:事件信号,包含已取消的识别结果(指示因直接取消请求或者传输或协议失败导致的识别尝试取消)。canceled: Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)

完成所有设置后,可以调用 start_continuous_recognition()With everything set up, we can call start_continuous_recognition().

speech_recognizer.start_continuous_recognition()
while not done:
    time.sleep(.5)

听写模式Dictation mode

使用连续识别时,可以使用相应的“启用听写”功能启用听写处理。When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. 此模式将导致语音配置实例解释句子结构的单词说明(如标点符号)。This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. 例如,言语“你居住在城镇吗问号”会被解释为文本“你居住在城镇吗?”。For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".

若要启用听写模式,请在 SpeechConfig 上使用 enable_dictation() 方法。To enable dictation mode, use the enable_dictation() method on your SpeechConfig.

SpeechConfig.enable_dictation()

更改源语言Change source language

语音识别的常见任务是指定输入(或源)语言。A common task for speech recognition is specifying the input (or source) language. 让我们看看如何将输入语言更改为德语。Let's take a look at how you would change the input language to German. 在代码中找到 SpeechConfig,并直接在其下方添加此行。In your code, find your SpeechConfig, then add this line directly below it.

speech_config.speech_recognition_language="de-DE"

speech_recognition_language 是采用字符串作为实参的形参。speech_recognition_language is a parameter that takes a string as an argument. 可以提供受支持的区域设置/语言的列表中的任何值。You can provide any value in the list of supported locales/languages.

提高识别准确度Improve recognition accuracy

可以通过多种方式使用语音 SDK 来提高识别的准确性。There are a few ways to improve recognition accuracy with the Speech SDK. 让我们看一下短语列表。Let's take a look at Phrase Lists. 短语列表用于标识音频数据中的已知短语,如人的姓名或特定位置。Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. 可以将单个词或完整短语添加到短语列表。Single words or complete phrases can be added to a Phrase List. 在识别期间,如果音频中包含整个短语的完全匹配项,则使用短语列表中的条目。During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. 如果找不到与短语完全匹配的项,则不支持识别。If an exact match to the phrase is not found, recognition is not assisted.

重要

短语列表功能仅以英语提供。The Phrase List feature is only available in English.

若要使用短语列表,请首先创建一个 PhraseListGrammar 对象,然后使用 addPhrase 添加特定的单词和短语。To use a phrase list, first create a PhraseListGrammar object, then add specific words and phrases with addPhrase.

PhraseListGrammar 所做的任何更改都将在下一次识别或重新连接到语音服务之后生效。Any changes to PhraseListGrammar take effect on the next recognition or after a reconnection to the Speech service.

phrase_list_grammar = speechsdk.PhraseListGrammar.from_recognizer(reco)
phrase_list_grammar.addPhrase("Supercalifragilisticexpialidocious")

如果需要清除短语列表:If you need to clear your phrase list:

phrase_list_grammar.clear()

提高识别精确度的其他方式Other options to improve recognition accuracy

短语列表只是提高识别准确度的一种方式。Phrase lists are only one option to improve recognition accuracy. 你还可以:You can also:

其他语言和平台支持Additional language and platform support

如果已单击此选项卡,则可能看不到你偏好的编程语言的基础知识文章。If you've clicked this tab, you probably didn't see a basics article in your favorite programming language. 别担心,我们在 GitHub 上提供了其他代码示例。Don't worry, we have additional code samples available on GitHub. 使用表格查找适用于编程语言和平台/OS 组合的相应示例。Use the table to find the right sample for your programming language and platform/OS combination.

语言Language 代码示例Code samples
C#C# .NET Framework.NET CoreUWPUnityXamarin.NET Framework, .NET Core, UWP, Unity, Xamarin
C++C++ 快速入门示例Quickstarts, Samples
JavaJava AndroidJREAndroid, JRE
JavascriptJavaScript 浏览器Browser
Node.jsNode.js Windows、Linux 和 macOSWindows, Linux, macOS
Objective-CObjective-C iOSmacOSiOS, macOS
PythonPython Windows、Linux 和 macOSWindows, Linux, macOS
SwiftSwift iOSmacOSiOS, macOS

后续步骤Next steps