快速入门:识别音频文件中的语音Quickstart: Recognize speech from an audio file

重要

需要语音 SDK 版本 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

在本快速入门中,我们将使用语音 SDK 从音频文件识别语音。In this quickstart you will use the Speech SDK to recognize speech from an audio file. 满足几个先决条件后,从文件识别语音只需几个步骤:After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的 AudioConfig 对象。Create an AudioConfig object that specifies the .WAV file name.
  • 使用以上的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • 使用 SpeechRecognizer 对象,开始单个言语的识别过程。Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 SpeechRecognitionResultInspect the SpeechRecognitionResult returned.

可以在 GitHub 上查看或下载所有语音 SDK C# 示例You can view or download all Speech SDK C# Samples on GitHub.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

支持的音频输入格式Supported audio input format

默认音频流格式为 WAV(16kHz 或 8kHz,16 位,单声道 PCM)。The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). 除了 WAV/PCM 外,还支持下列压缩输入格式。Outside of WAV / PCM, the compressed input formats listed below are also supported. 若要启用下列格式,需要其他配置Additional configuration is needed to enable the formats listed below.

  • MP3MP3
  • OPUS/OGGOPUS/OGG
  • FLACFLAC
  • wav 容器中的 ALAWALAW in wav container
  • wav 容器中的 MULAWMULAW in wav container

在 Visual Studio 中打开项目Open your project in Visual Studio

第一步是确保在 Visual Studio 中打开项目。The first step is to make sure that you have your project open in Visual Studio.

  1. 启动 Visual Studio 2019。Launch Visual Studio 2019.
  2. 加载项目并打开 Program.csLoad your project and open Program.cs.
  3. 下载 whatstheweatherlike.wav ,并将其添加到项目。Download the whatstheweatherlike.wav and add it to your project.
    • 将 whatstheweatherlike.wav 文件保存到 Program.cs 文件旁边。Save the whatstheweatherlike.wav file next to the Program.cs file.
    • 解决方案资源管理器中右键单击项目,选择“添加”>“现有项”。From the Solution Explorer right-click on the project, select Add > Existing item.
    • 选择 whatstheweatherlike.wav 文件,然后选择“添加”按钮。Select the whatstheweatherlike.wav file, then select the Add button.
    • 右键单击新添加的文件,选择“属性”。Right-click on the newly added file, select Properties.
    • 将“复制到输出目录”更改为“始终复制”。Change the Copy to Output Directory to Copy always.

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project. 请注意,已创建名为 RecognizeSpeechAsync() 的异步方法。Make note that you've created an async method called RecognizeSpeechAsync().

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace HelloWorld
{
    class Program
    {
        static async Task Main()
        {
            await RecognizeSpeechAsync();
        }

        static async Task RecognizeSpeechAsync()
        {
        }
    }
}

创建语音配置Create a Speech configuration

在初始化 SpeechRecognizer 对象之前,需要创建一个使用订阅密钥和订阅区域的配置。Before you can initialize a SpeechRecognizer object, you need to create a configuration that uses your subscription key and subscription region. 将此代码插入 RecognizeSpeechAsync() 方法。Insert this code in the RecognizeSpeechAsync() method.

备注

此示例使用 FromSubscription() 方法来生成 SpeechConfigThis sample uses the FromSubscription() method to build the SpeechConfig. 有关可用方法的完整列表,请参阅 SpeechConfig 类For a full list of available methods, see SpeechConfig Class. 语音 SDK 将默认使用 en-us 作为语言进行识别。若要了解如何选择源语言,请参阅指定语音转文本的源语言The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

// Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

创建音频配置Create an Audio configuration

现在,需要创建指向音频文件的 AudioConfig 对象。Now, you need to create an AudioConfig object that points to your audio file. 此对象是在 using 语句中创建的,以确保正确释放非托管资源。This object is created inside of a using statement to ensure the proper release of unmanaged resources. 将此代码插入语音配置下的 RecognizeSpeechAsync() 方法。Insert this code in the RecognizeSpeechAsync() method, right below your Speech configuration.

using (var audioInput = AudioConfig.FromWavFileInput("whatstheweatherlike.wav"))
{
}

初始化 SpeechRecognizerInitialize a SpeechRecognizer

现在,使用之前创建的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Now, let's create the SpeechRecognizer object using the SpeechConfig and AudioConfig objects created earlier. 此对象也是在 using 语句中创建的,以确保正确释放非托管资源。This object is also created inside of a using statement to ensure the proper release of unmanaged resources. 在用于包装 AudioConfig 对象的 using 语句中,将此代码插入 RecognizeSpeechAsync() 方法中。Insert this code in the RecognizeSpeechAsync() method, inside the using statement that wraps your AudioConfig object.

using (var recognizer = new SpeechRecognizer(config, audioInput))
{
}

识别短语Recognize a phrase

SpeechRecognizer 对象中,我们将调用 RecognizeOnceAsync() 方法。From the SpeechRecognizer object, you're going to call the RecognizeOnceAsync() method. 此方法是告知语音服务你要发送单个需识别的短语,在确定该短语后会停止识别语音。This method lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech.

在 using 语句中,添加以下代码:Inside the using statement, add this code:

Console.WriteLine("Recognizing first result...");
var result = await recognizer.RecognizeOnceAsync();

显示识别结果(或错误)Display the recognition results (or errors)

语音服务返回识别结果后,将需要对其进行处理。When the recognition result is returned by the Speech service, you'll want to do something with it. 我们会简单地将结果输出到控制台。We're going to keep it simple and print the result to console.

在 using 语句中的 RecognizeOnceAsync() 下方,添加以下代码:Inside the using statement, below RecognizeOnceAsync(), add this code:

switch (result.Reason)
{
    case ResultReason.RecognizedSpeech:
        Console.WriteLine($"We recognized: {result.Text}");
        break;
    case ResultReason.NoMatch:
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        break;
    case ResultReason.Canceled:
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
        break;
}

查看代码Check your code

此时,代码应如下所示:At this point, your code should look like this:

//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace HelloWorld
{
    class Program
    {
        static async Task Main()
        {
            await RecognizeSpeechAsync();
        }

        static async Task RecognizeSpeechAsync()
        {
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

            using (var audioInput = AudioConfig.FromWavFileInput("whatstheweatherlike.wav"))
            using (var recognizer = new SpeechRecognizer(config, audioInput))
            {
                Console.WriteLine("Recognizing first result...");
                var result = await recognizer.RecognizeOnceAsync();

                switch (result.Reason)
                {
                    case ResultReason.RecognizedSpeech:
                        Console.WriteLine($"We recognized: {result.Text}");
                        break;
                    case ResultReason.NoMatch:
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                        break;
                    case ResultReason.Canceled:
                        var cancellation = CancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
                
                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                        break;
                }
            }
        }
    }
}

生成并运行应用Build and run your app

现在,可以使用语音服务构建应用并测试语音识别。Now you're ready to build your app and test our speech recognition using the Speech service.

  1. 编译代码:从 Visual Studio 的菜单栏中,选择“生成” > “生成解决方案”。Compile the code: From the menu bar of Visual Studio, choose Build > Build Solution.

  2. 启动应用:在菜单栏中,选择“调试” > “开始调试”,或按 F5 。Start your app: From the menu bar, choose Debug > Start Debugging or press F5.

  3. 开始识别:语音文件将发送到语音服务,转录为文本,并在控制台中呈现。Start recognition: Your audio file is sent to the Speech service, transcribed as text, and rendered in the console.

    Recognizing first result...
    We recognized: What's the weather like?
    

后续步骤Next steps

了解语音识别的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


在本快速入门中,我们将使用语音 SDK 从音频文件识别语音。In this quickstart you will use the Speech SDK to recognize speech from an audio file. 满足几个先决条件后,从文件识别语音只需几个步骤:After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的 AudioConfig 对象。Create an AudioConfig object that specifies the .WAV file name.
  • 使用以上的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • 使用 SpeechRecognizer 对象,开始单个言语的识别过程。Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 SpeechRecognitionResultInspect the SpeechRecognitionResult returned.

可以在 GitHub 上查看或下载所有语音 SDK C++ 示例You can view or download all Speech SDK C++ Samples on GitHub.

选择目标环境Choose your target environment

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

支持的音频输入格式Supported audio input format

默认音频流格式为 WAV(16kHz 或 8kHz,16 位,单声道 PCM)。The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). 除了 WAV/PCM 外,还支持下列压缩输入格式。Outside of WAV / PCM, the compressed input formats listed below are also supported. 若要启用下列格式,需要其他配置Additional configuration is needed to enable the formats listed below.

  • MP3MP3
  • OPUS/OGGOPUS/OGG
  • FLACFLAC
  • wav 容器中的 ALAWALAW in wav container
  • wav 容器中的 MULAWMULAW in wav container

添加示例代码Add sample code

  1. 创建一个名为 helloworld.cpp 的 C++ 源文件,并将以下代码粘贴到其中。Create a C++ source file named helloworld.cpp, and paste the following code into it.

    
     // Creates an instance of a speech config with specified subscription key and service region.
     // Replace with your own subscription key and service region (e.g., "chinaeast2").
     auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
     // Creates a speech recognizer using a WAV file. The default language is "en-us".
     // Replace with your own audio file name.
     auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
     auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);
     cout << "Recognizing first result...\n";
    
     // Starts speech recognition, and returns after a single utterance is recognized. The end of a
     // single utterance is determined by listening for silence at the end or until a maximum of 15
     // seconds of audio is processed.  The task returns the recognition text as result.
     // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
     // shot recognition like command or query.
     // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
     auto result = recognizer->RecognizeOnceAsync().get();
    
     // Checks result.
     if (result->Reason == ResultReason::RecognizedSpeech)
     {
         cout << "RECOGNIZED: Text=" << result->Text << std::endl;
     }
     else if (result->Reason == ResultReason::NoMatch)
     {
         cout << "NOMATCH: Speech could not be recognized." << std::endl;
     }
     else if (result->Reason == ResultReason::Canceled)
     {
         auto cancellation = CancellationDetails::FromResult(result);
         cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
         if (cancellation->Reason == CancellationReason::Error)
         {
             cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
             cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
             cout << "CANCELED: Did you update the subscription info?" << std::endl;
         }
     }
    
    
  2. 在此新文件中,将字符串 YourSubscriptionKey 替换为你的语音服务订阅密钥。In this new file, replace the string YourSubscriptionKey with your Speech service subscription key.

  3. 将字符串 YourServiceRegion 替换为与订阅关联的区域中的“区域标识符”。Replace the string YourServiceRegion with the Region identifier from region associated with your subscription.

  4. whatstheweatherlike.wav 字符串替换为你自己的文件名。Replace the string whatstheweatherlike.wav with your own filename.

备注

语音 SDK 将默认使用 en-us 作为语言进行识别。若要了解如何选择源语言,请参阅指定语音转文本的源语言The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

生成应用Build the app

备注

请确保将以下命令输入在单个命令行上。Make sure to enter the commands below as a single command line. 执行该操作的最简单方法是使用每个命令旁边的“复制按钮”来复制命令,然后将其粘贴到 shell 提示符下。The easiest way to do that is to copy the command by using the Copy button next to each command, and then paste it at your shell prompt.

  • x64(64 位)系统上,运行以下命令来生成应用程序。On an x64 (64-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/x64" -l:libasound.so.2
    
  • x86(32 位)系统上,运行以下命令来生成应用程序。On an x86 (32-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/x86" -l:libasound.so.2
    
  • ARM64(64 位)系统上,运行以下命令生成应用程序。On an ARM64 (64-bit) system, run the following command to build the application.

    g++ helloworld.cpp -o helloworld -I "$SPEECHSDK_ROOT/include/cxx_api" -I "$SPEECHSDK_ROOT/include/c_api" --std=c++14 -lpthread -lMicrosoft.CognitiveServices.Speech.core -L "$SPEECHSDK_ROOT/lib/arm64" -l:libasound.so.2
    

运行应用Run the app

  1. 将加载程序的库路径配置为指向语音 SDK 库。Configure the loader's library path to point to the Speech SDK library.

    • x64(64 位)系统上,输入以下命令。On an x64 (64-bit) system, enter the following command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/x64"
      
    • x86(32 位)系统上,输入以下命令。On an x86 (32-bit) system, enter this command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/x86"
      
    • ARM64(64 位)系统上,输入以下命令。On an ARM64 (64-bit) system, enter the following command.

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SPEECHSDK_ROOT/lib/arm64"
      
  2. 运行应用程序。Run the application.

    ./helloworld
    
  3. 你的音频文件将传输到语音服务,文件中的第一个言语将转录为文本,该文本将显示在同一窗口中。Your audio file is transmitted to the Speech service and the first utterance in the file is transcribed to text, which appears in the same window.

    Recognizing first result...
    We recognized: What's the weather like?
    

后续步骤Next steps

了解语音识别的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

在本快速入门中,我们将使用语音 SDK 从音频文件识别语音。In this quickstart you will use the Speech SDK to recognize speech from an audio file. 满足几个先决条件后,从文件识别语音只需几个步骤:After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的 AudioConfig 对象。Create an AudioConfig object that specifies the .WAV file name.
  • 使用以上的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • 使用 SpeechRecognizer 对象,开始单个言语的识别过程。Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 SpeechRecognitionResultInspect the SpeechRecognitionResult returned.

可以在 GitHub 上查看或下载所有语音 SDK Java 示例You can view or download all Speech SDK Java Samples on GitHub.

先决条件Prerequisites

支持的音频输入格式Supported audio input format

默认音频流格式为 WAV(16kHz 或 8kHz,16 位,单声道 PCM)。The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). 除了 WAV/PCM 外,还支持下列压缩输入格式。Outside of WAV / PCM, the compressed input formats listed below are also supported. 若要启用下列格式,需要其他配置Additional configuration is needed to enable the formats listed below.

  • MP3MP3
  • OPUS/OGGOPUS/OGG
  • FLACFLAC
  • wav 容器中的 ALAWALAW in wav container
  • wav 容器中的 MULAWMULAW in wav container

添加示例代码Add sample code

  1. 若要向 Java 项目添加新的空类,请选择“文件” > “新建” > “类”。 To add a new empty class to your Java project, select File > New > Class.

  2. 在“新建 Java 类”窗口中,在“包”字段内输入 speechsdk.quickstart,在“名称”字段内输入 MainIn the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    “新建 Java 类”窗口的屏幕截图

  3. Main.java 中的所有代码替换为以下代码片段:Replace all code in Main.java with the following snippet:

    package speechsdk.quickstart;
    
    import java.util.concurrent.Future;
    import com.microsoft.cognitiveservices.speech.*;
    
    /**
     * Quickstart: recognize speech using the Speech SDK for Java.
     */
    public class Main {
    
        /**
         * @param args Arguments are ignored in this sample.
         */
        public static void main(String[] args) {
            try {
                // Replace below with your own subscription key
                String speechSubscriptionKey = "YourSubscriptionKey";
    
                // Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
                String serviceRegion = "YourServiceRegion";
    
                // Replace below with your own filename.
                String audioFileName = "whatstheweatherlike.wav";
    
                int exitCode = 1;
                SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
                assert(config != null);
    
                AudioConfig audioInput = AudioConfig.fromWavFileInput(audioFileName);
                assert(audioInput != null);
    
                SpeechRecognizer reco = new SpeechRecognizer(config, audioInput);
                assert(reco != null);
    
                System.out.println("Recognizing first result...");
    
                Future<SpeechRecognitionResult> task = reco.recognizeOnceAsync();
                assert(task != null);
    
                SpeechRecognitionResult result = task.get();
                assert(result != null);
    
                switch (result.getReason()) {
                    case ResultReason.RecognizedSpeech: {
                            System.out.println("We recognized: " + result.getText());
                            exitCode = 0;
                        }
                        break;
                    case ResultReason.NoMatch:
                        System.out.println("NOMATCH: Speech could not be recognized.");
                        break;
                    case ResultReason.Canceled: {
                            CancellationDetails cancellation = CancellationDetails.fromResult(result);
                            System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                            if (cancellation.getReason() == CancellationReason.Error) {
                                System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                                System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                                System.out.println("CANCELED: Did you update the subscription info?");
                            }
                        }
                        break;
                }
    
                reco.close();
    
                System.exit(exitCode);
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
    
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. 将字符串 YourSubscriptionKey 替换为你的订阅密钥。Replace the string YourSubscriptionKey with your subscription key.

  5. 将字符串 YourServiceRegion 替换为与订阅关联的区域Replace the string YourServiceRegion with the region associated with your subscription.

  6. whatstheweatherlike.wav 字符串替换为你自己的文件名。Replace the string whatstheweatherlike.wav with your own filename.

  7. 保存对项目的更改。Save changes to the project.

备注

语音 SDK 将默认使用 en-us 作为语言进行识别。若要了解如何选择源语言,请参阅指定语音转文本的源语言The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

生成并运行应用Build and run the app

按 F11,或选择“运行” > “调试”。 Press F11, or select Run > Debug. 头 15 秒,通过音频文件提供的语音输入将被识别并记录到控制台窗口中。The first 15 seconds of speech input from your audio file will be recognized and logged in the console window.

Recognizing first result...
We recognized: What's the weather like?

后续步骤Next steps

了解语音识别的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


在本快速入门中,我们将使用语音 SDK 从音频文件识别语音。In this quickstart you will use the Speech SDK to recognize speech from an audio file. 满足几个先决条件后,从文件识别语音只需几个步骤:After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的 AudioConfig 对象。Create an AudioConfig object that specifies the .WAV file name.
  • 使用以上的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • 使用 SpeechRecognizer 对象,开始单个言语的识别过程。Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 SpeechRecognitionResultInspect the SpeechRecognitionResult returned.

可以在 GitHub 上查看或下载所有语音 SDK Python 示例You can view or download all Speech SDK Python Samples on GitHub.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

支持的音频输入格式Supported audio input format

默认音频流格式为 WAV(16kHz 或 8kHz,16 位,单声道 PCM)。The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). 除了 WAV/PCM 外,还支持下列压缩输入格式。Outside of WAV / PCM, the compressed input formats listed below are also supported. 若要启用下列格式,需要其他配置Additional configuration is needed to enable the formats listed below.

  • MP3MP3
  • OPUS/OGGOPUS/OGG
  • FLACFLAC
  • wav 容器中的 ALAWALAW in wav container
  • wav 容器中的 MULAWMULAW in wav container

支持和更新Support and updates

语音 SDK Python 包的更新将通过 PyPI 分发,发行说明中会发布相关通告。Updates to the Speech SDK Python package are distributed via PyPI and announced in the Release notes. 如果有新版本可用,可以使用 pip install --upgrade azure-cognitiveservices-speech 命令进行更新。If a new version is available, you can update to it with the command pip install --upgrade azure-cognitiveservices-speech. 通过查看 azure.cognitiveservices.speech.__version__ 变量来检查当前安装的版本。Check which version is currently installed by inspecting the azure.cognitiveservices.speech.__version__ variable.

如果遇到问题或者缺少某项功能,请查看支持和帮助选项If you have a problem, or you're missing a feature, see Support and help options.

使用语音 SDK 创建 Python 应用程序Create a Python application that uses the Speech SDK

运行示例Run the sample

可将本快速入门中的示例代码复制到源文件 quickstart.py,然后在 IDE 或控制台中运行该代码You can copy the sample code from this quickstart to a source file quickstart.py and run it in your IDE or in the console:

python quickstart.py

或者,可以从语音 SDK 示例存储库Jupyter Notebook 的形式下载本快速入门教程,并将其作为 Notebook 运行。Or you can download this quickstart tutorial as a Jupyter notebook from the Speech SDK sample repository and run it as a notebook.

代码示例Sample code

备注

语音 SDK 将默认使用 en-us 作为语言进行识别。若要了解如何选择源语言,请参阅指定语音转文本的源语言The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates an audio configuration that points to an audio file.
# Replace with your own audio filename.
audio_filename = "whatstheweatherlike.wav"
audio_input = speechsdk.audio.AudioConfig(filename=audio_filename)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)

print("Recognizing first result...")

# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed.  The task returns the recognition text as result. 
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query. 
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

通过 Visual Studio Code 安装并使用语音 SDKInstall and use the Speech SDK with Visual Studio Code

  1. 在计算机上下载并安装 64 位版本的 Python(3.5 到 3.8)。Download and install a 64-bit version of Python, 3.5 to 3.8, on your computer.

  2. 下载并安装 Visual Studio CodeDownload and install Visual Studio Code.

  3. 打开 Visual Studio Code 并安装 Python 扩展。Open Visual Studio Code and install the Python extension. 在菜单中选择“文件”**** > ****“首选项” > ****“扩展”。Select File > Preferences > Extensions from the menu. 搜索 PythonSearch for Python.

    安装 Python 扩展

  4. 创建一个文件夹用于存储项目。Create a folder to store the project in. 例如,使用 Windows 资源管理器。An example is by using Windows Explorer.

  5. 在 Visual Studio Code 中选择“文件”图标。In Visual Studio Code, select the File icon. 然后打开创建的文件夹。Then open the folder you created.

    打开文件夹

  6. 选择“新建文件”图标创建新的 Python 源文件 speechsdk.pyCreate a new Python source file, speechsdk.py, by selecting the new file icon.

    创建文件

  7. 复制 Python 代码并将其粘贴到新建的文件,然后保存文件。Copy, paste, and save the Python code to the newly created file.

  8. 插入语音服务订阅信息。Insert your Speech service subscription information.

  9. 如果已选择 Python 解释器,窗口底部的状态栏左侧会显示它。If selected, a Python interpreter displays on the left side of the status bar at the bottom of the window. 否则,会显示可用 Python 解释器的列表。Otherwise, bring up a list of available Python interpreters. 打开命令面板 Ctrl+Shift+P 并输入 Python:Select InterpreterOpen the command palette Ctrl+Shift+P and enter Python: Select Interpreter. 选择适当的解释器。Choose an appropriate one.

  10. 如果尚未为所选的 Python 解释器安装,You can install the Speech SDK Python package from within Visual Studio Code. 可以从 Visual Studio Code 内部安装语音 SDK Python 包。Do that if it's not installed yet for the Python interpreter you selected. 若要安装语音 SDK 包,请打开终端。To install the Speech SDK package, open a terminal. 再次启动命令面板 Ctrl+Shift+P 并输入 Terminal:Create New Integrated Terminal 来打开终端。Bring up the command palette again Ctrl+Shift+P and enter Terminal: Create New Integrated Terminal. 在打开的终端中,输入命令 python -m pip install azure-cognitiveservices-speech,或者输入适用于系统的命令。In the terminal that opens, enter the command python -m pip install azure-cognitiveservices-speech or the appropriate command for your system.

  11. 若要运行示例代码,请在编辑器中的某个位置单击右键。To run the sample code, right-click somewhere inside the editor. 选择“在终端中运行 Python 文件”。Select Run Python File in Terminal. 头 15 秒,通过音频文件提供的语音输入将被识别并记录到控制台窗口中。The first 15 seconds of speech input from your audio file will be recognized and logged in the console window.

    Recognizing first result...
    We recognized: What's the weather like?
    

如果在遵照这些说明操作时遇到问题,请参阅内容更全面的 Visual Studio Code Python 教程If you have issues following these instructions, refer to the more extensive Visual Studio Code Python tutorial.

后续步骤Next steps

了解语音识别的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

在本快速入门中,我们将使用语音 SDK 从音频文件识别语音。In this quickstart you will use the Speech SDK to recognize speech from an audio file. 满足几个先决条件后,从文件识别语音只需几个步骤:After satisfying a few prerequisites, recognizing speech from a file only takes a few steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的 AudioConfig 对象。Create an AudioConfig object that specifies the .WAV file name.
  • 使用以上的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Create a SpeechRecognizer object using the SpeechConfig and AudioConfig objects from above.
  • 使用 SpeechRecognizer 对象,开始单个言语的识别过程。Using the SpeechRecognizer object, start the recognition process for a single utterance.
  • 检查返回的 SpeechRecognitionResultInspect the SpeechRecognitionResult returned.

可以在 GitHub 上查看或下载所有语音 SDK JavaScript 示例You can view or download all Speech SDK JavaScript Samples on GitHub.

选择目标环境Choose your target environment

先决条件Prerequisites

准备工作:Before you get started:

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project.

    <!DOCTYPE html>
    <html>
    <head>
    <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
    <meta charset="utf-8" />
    </head>
    <body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
    </body>
    </html>

添加 UI 元素Add UI Elements

现在,我们将为输入框添加一些基本 UI,引用语音 SDK 的 JavaScript,并获取授权令牌(如果有)。Now we'll add some basic UI for input boxes, reference the Speech SDK's JavaScript, and grab an authorization token if available.

  <div id="content" style="display:none">
    <table width="100%">
      <tr>
        <td></td>
        <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
      </tr>
      <tr>
        <td align="right"><a href="https://docs.azure.cn/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
        <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
      </tr>
      <tr>
        <td align="right">Region</td>
        <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
      </tr>
      <tr>
        <td align="right">File</td>
        <td><input type="file" id="filePicker" accept=".wav" style="display:none" /></td>
      </tr>
      <tr>
        <td></td>
        <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
      </tr>
      <tr>
        <td align="right" valign="top">Results</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
      </tr>
    </table>
  </div>

  <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>

   <script>
  // Note: Replace the URL with a valid endpoint to retrieve
  //       authorization tokens for your subscription.
  var authorizationEndpoint = "token.php";

  function RequestAuthorizationToken() {
    if (authorizationEndpoint) {
      var a = new XMLHttpRequest();
      a.open("GET", authorizationEndpoint);
      a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
      a.send("");
      a.onload = function() {
          var token = JSON.parse(atob(this.responseText.split(".")[1]));
          serviceRegion.value = token.region;
          authorizationToken = this.responseText;
          subscriptionKey.disabled = true;
          subscriptionKey.value = "using authorization token (hit F5 to refresh)";
          console.log("Got an authorization token: " + token);
      }
    }
  }
  </script>
  
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var startRecognizeOnceAsyncButton;

    // subscription key and region for speech services.
    var subscriptionKey, serviceRegion;
    var authorizationToken;
    var SpeechSDK;
    var recognizer;
    var filePicker;
    var audioFile;

    document.addEventListener("DOMContentLoaded", function () {
      startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
      subscriptionKey = document.getElementById("subscriptionKey");
      serviceRegion = document.getElementById("serviceRegion");
      phraseDiv = document.getElementById("phraseDiv");
      filePicker = document.getElementById('filePicker');
      
      filePicker.addEventListener("change", function () {
                audioFile = filePicker.files[0];
            });

      startRecognizeOnceAsyncButton.addEventListener("click", function () {
        startRecognizeOnceAsyncButton.disabled = true;
        phraseDiv.innerHTML = "";

      });

      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        startRecognizeOnceAsyncButton.disabled = false;

        document.getElementById('content').style.display = 'block';
        document.getElementById('warning').style.display = 'none';

        // in case we have a function for getting an authorization token, call it.
        if (typeof RequestAuthorizationToken === "function") {
            RequestAuthorizationToken();
        }
      }
    });
  </script>

创建语音配置Create a Speech configuration

在初始化 SpeechRecognizer 对象之前,需要创建一个使用订阅密钥和订阅区域的配置。Before you can initialize a SpeechRecognizer object, you need to create a configuration that uses your subscription key and subscription region. 将此代码插入 startRecognizeOnceAsyncButton.addEventListener() 方法。Insert this code in the startRecognizeOnceAsyncButton.addEventListener() method.

备注

语音 SDK 将默认使用 en-us 作为语言进行识别。若要了解如何选择源语言,请参阅指定语音转文本的源语言The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

        // if we got an authorization token, use the token. Otherwise use the provided subscription key
        var speechConfig;
        if (authorizationToken) {
          speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
        } else {
          if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
            alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
            return;
          }
          speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
        }

        speechConfig.speechRecognitionLanguage = "en-US";

创建音频配置Create an Audio configuration

现在,需要创建指向音频文件的 AudioConfig 对象。Now, you need to create an AudioConfig object that points to your audio file. 将此代码插入语音配置下的 startRecognizeOnceAsyncButton.addEventListener() 方法。Insert this code in the startRecognizeOnceAsyncButton.addEventListener() method, right below your Speech configuration.

        var audioConfig  = SpeechSDK.AudioConfig.fromWavFileInput(audioFile);

初始化 SpeechRecognizerInitialize a SpeechRecognizer

现在,使用之前创建的 SpeechConfigAudioConfig 对象创建 SpeechRecognizer 对象。Now, let's create the SpeechRecognizer object using the SpeechConfig and AudioConfig objects created earlier. 将此代码插入 startRecognizeOnceAsyncButton.addEventListener() 方法。Insert this code in the startRecognizeOnceAsyncButton.addEventListener() method.

        recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

识别短语Recognize a phrase

SpeechRecognizer 对象中,我们将调用 recognizeOnceAsync() 方法。From the SpeechRecognizer object, you're going to call the recognizeOnceAsync() method. 此方法是告知语音服务你要发送单个需识别的短语,在确定该短语后会停止识别语音。This method lets the Speech service know that you're sending a single phrase for recognition, and that once the phrase is identified to stop recognizing speech.

recognizer.recognizeOnceAsync(
          function (result) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += result.text;
            window.console.log(result);

            recognizer.close();
            recognizer = undefined;
          },
          function (err) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += err;
            window.console.log(err);

            recognizer.close();
            recognizer = undefined;
          });

查看代码Check your code

<!DOCTYPE html>
<html>
<head>
  <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
  <meta charset="utf-8" />
</head>
<body style="font-family:'Helvetica Neue',Helvetica,Arial,sans-serif; font-size:13px;">
  <!-- <uidiv> -->
  <div id="warning">
    <h1 style="font-weight:500;">Speech Recognition Speech SDK not found (microsoft.cognitiveservices.speech.sdk.bundle.js missing).</h1>
  </div>
  
  <div id="content" style="display:none">
    <table width="100%">
      <tr>
        <td></td>
        <td><h1 style="font-weight:500;">Microsoft Cognitive Services Speech SDK JavaScript Quickstart</h1></td>
      </tr>
      <tr>
        <td align="right"><a href="https://docs.azure.cn/cognitive-services/speech-service/get-started" target="_blank">Subscription</a>:</td>
        <td><input id="subscriptionKey" type="text" size="40" value="subscription"></td>
      </tr>
      <tr>
        <td align="right">Region</td>
        <td><input id="serviceRegion" type="text" size="40" value="YourServiceRegion"></td>
      </tr>
      <tr>
        <td align="right">File</td>
        <td><input type="file" id="filePicker" accept=".wav" style="display:none" /></td>
      </tr>
      <tr>
        <td></td>
        <td><button id="startRecognizeOnceAsyncButton">Start recognition</button></td>
      </tr>
      <tr>
        <td align="right" valign="top">Results</td>
        <td><textarea id="phraseDiv" style="display: inline-block;width:500px;height:200px"></textarea></td>
      </tr>
    </table>
  </div>
  <!-- </uidiv> -->

  <!-- <speechsdkref> -->
  <!-- Speech SDK reference sdk. -->
  <script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>
  <!-- </speechsdkref> -->

  <!-- <authorizationfunction> -->
  <!-- Speech SDK Authorization token -->
  <script>
  // Note: Replace the URL with a valid endpoint to retrieve
  //       authorization tokens for your subscription.
  var authorizationEndpoint = "token.php";

  function RequestAuthorizationToken() {
    if (authorizationEndpoint) {
      var a = new XMLHttpRequest();
      a.open("GET", authorizationEndpoint);
      a.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
      a.send("");
      a.onload = function() {
          var token = JSON.parse(atob(this.responseText.split(".")[1]));
          serviceRegion.value = token.region;
          authorizationToken = this.responseText;
          subscriptionKey.disabled = true;
          subscriptionKey.value = "using authorization token (hit F5 to refresh)";
          console.log("Got an authorization token: " + token);
      }
    }
  }
  </script>
  <!-- </authorizationfunction> -->

  <!-- <quickstartcode> -->
  <!-- Speech SDK USAGE -->
  <script>
    // status fields and start button in UI
    var phraseDiv;
    var startRecognizeOnceAsyncButton;

    // subscription key and region for speech services.
    var subscriptionKey, serviceRegion;
    var authorizationToken;
    var SpeechSDK;
    var recognizer;
    var filePicker;
    var audioFile;

    document.addEventListener("DOMContentLoaded", function () {
      startRecognizeOnceAsyncButton = document.getElementById("startRecognizeOnceAsyncButton");
      subscriptionKey = document.getElementById("subscriptionKey");
      serviceRegion = document.getElementById("serviceRegion");
      phraseDiv = document.getElementById("phraseDiv");
      filePicker = document.getElementById('filePicker');
      
      filePicker.addEventListener("change", function () {
                audioFile = filePicker.files[0];
            });

      startRecognizeOnceAsyncButton.addEventListener("click", function () {
        startRecognizeOnceAsyncButton.disabled = true;
        phraseDiv.innerHTML = "";

        // if we got an authorization token, use the token. Otherwise use the provided subscription key
        var speechConfig;
        if (authorizationToken) {
          speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, serviceRegion.value);
        } else {
          if (subscriptionKey.value === "" || subscriptionKey.value === "subscription") {
            alert("Please enter your Microsoft Cognitive Services Speech subscription key!");
            return;
          }
          speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey.value, serviceRegion.value);
        }

        speechConfig.speechRecognitionLanguage = "en-US";
        var audioConfig  = SpeechSDK.AudioConfig.fromFile(audioFile);
        recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);

        recognizer.recognizeOnceAsync(
          function (result) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += result.text;
            window.console.log(result);

            recognizer.close();
            recognizer = undefined;
          },
          function (err) {
            startRecognizeOnceAsyncButton.disabled = false;
            phraseDiv.innerHTML += err;
            window.console.log(err);

            recognizer.close();
            recognizer = undefined;
          });
      });

      if (!!window.SpeechSDK) {
        SpeechSDK = window.SpeechSDK;
        startRecognizeOnceAsyncButton.disabled = false;

        document.getElementById('content').style.display = 'block';
        document.getElementById('warning').style.display = 'none';

        // in case we have a function for getting an authorization token, call it.
        if (typeof RequestAuthorizationToken === "function") {
            RequestAuthorizationToken();
        }
      }
    });
  </script>
  <!-- </quickstartcode> -->
</body>
</html>

创建令牌源(可选)Create the token source (optional)

如果要在 web 服务器上承载网页,可以为演示应用程序提供令牌源。In case you want to host the web page on a web server, you can optionally provide a token source for your demo application. 这样一来,订阅密钥永远不会离开服务器,并且用户可以在不输入任何授权代码的情况下使用语音功能。That way, your subscription key will never leave your server while allowing users to use speech capabilities without entering any authorization code themselves.

创建名为 token.php 的新文件。Create a new file named token.php. 此示例假设 Web 服务器在启用 cURL 的情况下支持 PHP 脚本语言。In this example we assume your web server supports the PHP scripting language with curl enabled. 输入以下代码:Enter the following code:

<?php
header('Access-Control-Allow-Origin: ' . $_SERVER['SERVER_NAME']);

// Replace with your own subscription key and service region (e.g., "chinaeast2").
$subscriptionKey = 'YourSubscriptionKey';
$region = 'YourServiceRegion';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://' . $region . '.api.cognitive.azure.cn/sts/v1.0/issueToken');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, '{}');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json', 'Ocp-Apim-Subscription-Key: ' . $subscriptionKey));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
?>

备注

授权令牌仅具有有限的生存期。Authorization tokens only have a limited lifetime. 此简化示例不显示如何自动刷新授权令牌。This simplified example does not show how to refresh authorization tokens automatically. 作为用户,你可以手动重载页面或点击 F5 刷新。As a user, you can manually reload the page or hit F5 to refresh.

在本地生成和运行示例Build and run the sample locally

要启动应用,双击 index.html 文件或使用你喜欢的 web 浏览器打开 index.html。To launch the app, double-click on the index.html file or open index.html with your favorite web browser. 随即显示的简单 GUI 允许你输入订阅密钥和区域以及使用麦克风触发识别。It will present a simple GUI allowing you to enter your subscription key and region and trigger a recognition using the microphone.

备注

此方法对 Safari 浏览器不起作用。This method doesn't work on the Safari browser. 在 Safari 上,示例网页需要托管在 Web 服务器上;Safari 不允许从本地文件加载的网站使用麦克风。On Safari, the sample web page needs to be hosted on a web server; Safari doesn't allow websites loaded from a local file to use the microphone.

通过 web 服务器生成并运行示例Build and run the sample via a web server

要启动应用,打开你最喜欢的 web 浏览器,将其指向承载文件夹的公共 URL,输入你的区域,使用麦克风触发识别。To launch your app, open your favorite web browser and point it to the public URL that you host the folder on, enter your region, and trigger a recognition using the microphone. 配置后,它将获取令牌源中的令牌。If configured, it will acquire a token from your token source.

后续步骤Next steps

了解语音识别的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech recognition, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

在本快速入门中,你将使用命令行中的语音 CLI 识别音频文件中记录的语音,并生成文本脚本。In this quickstart, you use the Speech CLI from the command line to recognize speech recorded in an audio file, and produce a text transcription. 可轻松使用语音 CLI 来执行常见识别任务,例如转录对话。It's easy to use the Speech CLI to perform common recognition tasks, such as transcribing conversations. 经过一次性配置后,可通过语音 CLI 使用麦克风以交互方式将音频转录为文本,或使用批处理脚本从文件进行转录。After a one-time configuration, the Speech CLI lets you transcribe audio into text interactively with a microphone or from files using a batch script.

下载并安装Download and install

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  2. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

备注

在 Windows 上,语音 CLI 只能显示本地计算机上命令提示符适用的字体。On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows 终端支持通过语音 CLI 以交互方式生成的所有字体。Windows Terminal supports all fonts produced interactively by the Speech CLI. 如果输出到文件,文本编辑器(例如记事本)或 web 浏览器(例如 Microsoft Edge)也可以显示所有字体。If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

备注

查找命令时,Powershell 不会检查本地目录。Powershell does not check the local directory when looking for a command. 在 Powershell 中,将目录更改为 spx 的位置,并通过输入 .\spx 调用工具。In Powershell, change directory to the location of spx and call the tool by entering .\spx. 如果将此目录添加到路径,则 Powershell 和 Windows 命令提示符会从不包含 .\ 前缀的任何目录中查找 spxIf you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

创建订阅配置Create subscription config

若要开始使用语音 CLI,首先需要输入语音订阅密钥和区域信息。To start using the Speech CLI, you first need to enter your Speech subscription key and region information. 请查看区域支持页,找到你的区域标识符。See the region support page to find your region identifier. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. chinaeast2),运行以下命令。chinaeast2), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

查找包含语音的文件Find a file that contains speech

语音 CLI 可以识别多种文件格式和自然语言的语音。The Speech CLI can recognize speech in many file formats and natural languages. 对于本快速入门,可以使用包含英语语音的 WAV 文件(16kHz 或 8kHz,16 位,mono PCM)。For this quickstart, you can use a WAV file (16kHz or 8kHz, 16-bit, and mono PCM) that contains English speech.

  1. 下载 whatstheweatherlike.wav Download the whatstheweatherlike.wav .
  2. whatstheweatherlike.wav 文件复制到语音 CLI 二进制文件所在的目录中。Copy the whatstheweatherlike.wav file to the same directory as the Speech CLI binary file.

运行语音 CLIRun the Speech CLI

现在,可以运行语音 CLI 来识别声音文件中的语音。Now you're ready to run the Speech CLI to recognize speech found in the sound file.

在命令行中,更改为包含语音 CLI 二进制文件的目录,然后键入:From the command line, change to the directory that contains the Speech CLI binary file, and type:

spx recognize --file whatstheweatherlike.wav

备注

语音 CLI 默认为英语。The Speech CLI defaults to English. 你可以从“语音转文本”表中选择不同语言。You can choose a different language from the Speech-to-text table. 例如,添加 --source de-DE 以识别德语语音。For example, add --source de-DE to recognize German speech.

语音 CLI 将在屏幕上显示语音的文本转录。The Speech CLI will show a text transcription of the speech on the screen. 然后,语音 CLI 将关闭。Then the Speech CLI will close.

后续步骤Next steps

继续浏览基础知识,了解语音 CLI 的其他功能。Continue exploring the basics to learn about other features of the Speech CLI.

查看或下载 GitHub 上所有的语音 SDK 示例View or download all Speech SDK Samples on GitHub.

其他语言和平台支持Additional language and platform support

重要

需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

如果已单击此选项卡,则可能看不到采用你偏好的编程语言的快速入门。If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. 别担心,我们在 GitHub 上提供了其他快速入门材料和代码示例。Don't worry, we have additional quickstart materials and code samples available on GitHub. 使用表格查找适用于编程语言和平台/OS 组合的相应示例。Use the table to find the right sample for your programming language and platform/OS combination.

语言Language 其他快速入门Additional Quickstarts 代码示例Code samples
C#C# 来自麦克风来自 blobFrom mic, From blob .NET Framework.NET CoreUWPUnityXamarin.NET Framework, .NET Core, UWP, Unity, Xamarin
C++C++ 来自麦克风来自 blobFrom mic, From blob WindowsLinuxmacOSWindows, Linux, macOS
JavaJava 来自麦克风来自 blobFrom mic, From blob AndroidJREAndroid, JRE
JavaScriptJavaScript 在浏览器上识别来自麦克风的语音在 Node.js 上识别来自文件的语音Browser from mic, Node.js from file Windows、Linux 和 macOSWindows, Linux, macOS
Objective-CObjective-C [在 iOS 上识别来自麦克风的语音][obj-iOS-qck]、[在 macOS 上识别来自麦克风的语音][obj-macOS-qck][iOS from mic][obj-iOS-qck], [macOS from mic][obj-macOS-qck] iOSmacOSiOS, macOS
PythonPython 来自麦克风来自 blobFrom mic, From blob Windows、Linux 和 macOSWindows, Linux, macOS
SwiftSwift [在 iOS 上识别来自麦克风的语音][swift-iOS-qck]、[在 macOS 上识别来自麦克风的语音][swift-macOS-qck][iOS from mic][swift-iOS-qck], [macOS from mic][swift-macOS-qck] iOSmacOSiOS, macOS