快速入门:将语音合成到音频文件Quickstart: Synthesize speech into an audio file

重要

需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

在本快速入门中,你将使用语音 SDK 将文本转换为音频文件中的合成语音。In this quickstart, you will use the Speech SDK to convert text to synthesized speech in an audio file. 文本转语音语言支持下,文本转语音服务为合成语音提供了多种选项。The text-to-speech service provides numerous options for synthesized voices, under text-to-speech language support. 在满足几项先决条件后,将语音合成到文件只需五个步骤:After satisfying a few prerequisites, synthesizing speech into a file only takes five steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的“音频配置”对象。Create an Audio Configuration object that specifies the .WAV file name.
  • 使用上面的配置对象创建 SpeechSynthesizer 对象。Create a SpeechSynthesizer object using the configuration objects from above.
  • 使用 SpeechSynthesizer 对象,将文本转换为合成语音,并将其保存到指定的音频文件中。Using the SpeechSynthesizer object, convert your text into synthesized speech, saving it into the audio file specified.
  • 检查返回的 SpeechSynthesizer 中是否有错误。Inspect the SpeechSynthesizer returned for errors.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK C# 示例If you prefer to jump right in, view or download all Speech SDK C# Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

在 Visual Studio 中打开项目Open your project in Visual Studio

第一步是确保在 Visual Studio 中打开项目。The first step is to make sure that you have your project open in Visual Studio.

  1. 启动 Visual Studio 2019。Launch Visual Studio 2019.
  2. 加载项目并打开 Program.csLoad your project and open Program.cs.

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project. 请注意,已创建名为 SynthesisToAudioFileAsync() 的异步方法。Make note that you've created an async method called SynthesisToAudioFileAsync().


using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

namespace helloworld
{
    class Program
    {
        public static async Task SynthesisToAudioFileAsync()
        {
        }

        static void Main()
        {
            SynthesisToAudioFileAsync().Wait();
        }
    }
}

创建语音配置Create a Speech configuration

在初始化 SpeechSynthesizer 对象之前,需要创建一个使用订阅密钥和订阅区域的配置。Before you can initialize a SpeechSynthesizer object, you need to create a configuration that uses your subscription key and subscription region. 将此代码插入 SynthesisToAudioFileAsync() 方法。Insert this code in the SynthesisToAudioFileAsync() method.

// Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
// The default language is "en-us".
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

创建音频配置Create an Audio configuration

现在,需要创建指向音频文件的 AudioConfig 对象。Now, you need to create an AudioConfig object that points to your audio file. 此对象是在 using 语句中创建的,以确保正确释放非托管资源。This object is created inside of a using statement to ensure the proper release of unmanaged resources. 将此代码插入语音配置下的 SynthesisToAudioFileAsync() 方法。Insert this code in the SynthesisToAudioFileAsync() method, right below your Speech configuration.

var fileName = "helloworld.wav";
using (var fileOutput = AudioConfig.FromWavFileOutput(fileName))
{
}

初始化 SpeechSynthesizerInitialize a SpeechSynthesizer

现在,使用之前创建的 SpeechConfigAudioConfig 对象创建 SpeechSynthesizer 对象。Now, let's create the SpeechSynthesizer object using the SpeechConfig and AudioConfig objects created earlier. 此对象也是在 using 语句中创建的,以确保正确释放非托管资源。This object is also created inside of a using statement to ensure the proper release of unmanaged resources. 在用于包装 AudioConfig 对象的 using 语句中,将此代码插入 SynthesisToAudioFileAsync() 方法中。Insert this code in the SynthesisToAudioFileAsync() method, inside the using statement that wraps your AudioConfig object.

using (var synthesizer = new SpeechSynthesizer(config, fileOutput))
{
}

使用 SpeakTextAsync 的合成文本Synthesize text using SpeakTextAsync

SpeechSynthesizer 对象中,我们将调用 SpeakTextAsync() 方法。From the SpeechSynthesizer object, you're going to call the SpeakTextAsync() method. 此方法将文本发送到语音服务,该服务会将其转换为音频。This method sends your text to the Speech service which converts it to audio. 如果未显式指定 config.VoiceName,则 SpeechSynthesizer 将使用默认语音。The SpeechSynthesizer will use the default voice if config.VoiceName isn't explicitly specified.

在 using 语句中,添加以下代码:Inside the using statement, add this code:

var text = "Hello world!";
var result = await synthesizer.SpeakTextAsync(text);

检查是否有错误Check for errors

当语音服务返回合成结果时,应进行检查以确保文本已成功合成。When the synthesis result is returned by the Speech service, you should check to make sure your text was successfully synthesized.

在 using 语句中的 SpeakTextAsync() 下方,添加以下代码:Inside the using statement, below SpeakTextAsync(), add this code:

if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
    Console.WriteLine($"Speech synthesized to [{fileName}] for text [{text}]");
}
else if (result.Reason == ResultReason.Canceled)
{
    var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

    if (cancellation.Reason == CancellationReason.Error)
    {
        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
        Console.WriteLine($"CANCELED: Did you update the subscription info?");
    }
}

查看代码Check your code

此时,代码应如下所示:At this point, your code should look like this:

//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
//

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

namespace helloworld
{
    class Program
    {
        public static async Task SynthesisToAudioFileAsync()
        {
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

            var fileName = "helloworld.wav";
            using (var fileOutput = AudioConfig.FromWavFileOutput(fileName))
            {
                using (var synthesizer = new SpeechSynthesizer(config, fileOutput))
                {
                    var text = "Hello world!";
                    var result = await synthesizer.SpeakTextAsync(text);

                    if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                    {
                        Console.WriteLine($"Speech synthesized to [{fileName}] for text [{text}]");
                    }
                    else if (result.Reason == ResultReason.Canceled)
                    {
                        var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                    }
                }
            }
        }

        static void Main()
        {
            SynthesisToAudioFileAsync().Wait();
        }
    }
}

生成并运行应用Build and run your app

现在,可以使用语音服务构建应用并测试语音合成。Now you're ready to build your app and test our speech synthesis using the Speech service.

  1. “编译代码”- 在 Visual Studio 菜单栏中,选择“生成” > “生成解决方案” 。Compile the code - From the menu bar of Visual Studio, choose Build > Build Solution.

  2. 启动应用 - 在菜单栏中,选择“调试” > “开始调试”,或按 F5 。Start your app - From the menu bar, choose Debug > Start Debugging or press F5.

  3. 开始合成 - 将文本转换为语音,并保存在指定的音频数据中。Start synthesis - Your text is converted to speech, and saved in the audio data specified.

    Speech synthesized to [helloworld.wav] for text [Hello world!]
    

后续步骤Next steps

了解语音合成的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech synthesis, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


在本快速入门中,你将使用语音 SDK 将文本转换为音频文件中的合成语音。In this quickstart, you will use the Speech SDK to convert text to synthesized speech in an audio file. 文本转语音语言支持下,文本转语音服务为合成语音提供了多种选项。The text-to-speech service provides numerous options for synthesized voices, under text-to-speech language support. 在满足几项先决条件后,将语音合成到文件只需五个步骤:After satisfying a few prerequisites, synthesizing speech into a file only takes five steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的“音频配置”对象。Create an Audio Configuration object that specifies the .WAV file name.
  • 使用上面的配置对象创建 SpeechSynthesizer 对象。Create a SpeechSynthesizer object using the configuration objects from above.
  • 使用 SpeechSynthesizer 对象,将文本转换为合成语音,并将其保存到指定的音频文件中。Using the SpeechSynthesizer object, convert your text into synthesized speech, saving it into the audio file specified.
  • 检查返回的 SpeechSynthesizer 中是否有错误。Inspect the SpeechSynthesizer returned for errors.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK C++ 示例If you prefer to jump right in, view or download all Speech SDK C++ Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

在开始之前,请务必:Before you get started, make sure to:

添加示例代码Add sample code

  1. 打开源文件 helloworld.cppOpen the source file helloworld.cpp.

  2. 将所有代码替换为以下片段:Replace all the code with the following snippet:

    
     // Creates an instance of a speech config with specified subscription key and service region.
     // Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
     auto config = SpeechConfig::FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
     // Creates a speech synthesizer using file as audio output.
     // Replace with your own audio file name.
     auto fileName = "helloworld.wav";
     auto fileOutput = AudioConfig::FromWavFileOutput(fileName);
     auto synthesizer = SpeechSynthesizer::FromConfig(config, fileOutput);
    
     // Converts the specified text to speech, saving the audio data in the file specified above.
     // Replace with your own text.
     auto text = "Hello world!";
     auto result = synthesizer->SpeakTextAsync(text).get();
    
     // Checks result for successful completion or errors.
     if (result->Reason == ResultReason::SynthesizingAudioCompleted)
     {
         cout << "Speech synthesized to [" << fileName << "] for text [" << text << "]" << std::endl;
     }
     else if (result->Reason == ResultReason::Canceled)
     {
         auto cancellation = SpeechSynthesisCancellationDetails::FromResult(result);
         cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
         if (cancellation->Reason == CancellationReason::Error)
         {
             cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
             cout << "CANCELED: ErrorDetails=[" << cancellation->ErrorDetails << "]" << std::endl;
             cout << "CANCELED: Did you update the subscription info?" << std::endl;
         }
     }
    
    
  3. 在同一文件中,将字符串 YourSubscriptionKey 替换为你的订阅密钥。In the same file, replace the string YourSubscriptionKey with your subscription key.

  4. 将字符串 YourServiceRegion 替换为与订阅关联的区域(例如,对于试用订阅,为 chinaeast2)。Replace the string YourServiceRegion with the region associated with your subscription (for example, chinaeast2 for the trial subscription).

  5. helloworld.wav 字符串替换为你自己的文件名。Replace the string helloworld.wav with your own filename.

  6. 在菜单栏中,选择“文件”**** > “全部保存”****。From the menu bar, choose File > Save All.

生成并运行应用程序Build and run the application

  1. 从菜单栏中,选择“构建”**** > “构建解决方案”**** 以构建应用程序。From the menu bar, select Build > Build Solution to build the application. 现在,编译代码时应不会提示错误。The code should compile without errors now.

  2. 选择“调试”**** > “开始调试”****(或按 F5****)以启动 helloworld**** 应用程序。Choose Debug > Start Debugging (or press F5) to start the helloworld application.

  3. 文本转换为语音,并保存在指定的音频数据中。Your text is converted to speech, and saved in the audio data specified.

    Speech synthesized to [helloworld.wav] for text [Hello world!]
    

后续步骤Next steps

了解语音合成的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech synthesis, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


在本快速入门中,你将使用语音 SDK 将文本转换为音频文件中的合成语音。In this quickstart, you will use the Speech SDK to convert text to synthesized speech in an audio file. 文本转语音语言支持下,文本转语音服务为合成语音提供了多种选项。The text-to-speech service provides numerous options for synthesized voices, under text-to-speech language support. 在满足几项先决条件后,将语音合成到文件只需五个步骤:After satisfying a few prerequisites, synthesizing speech into a file only takes five steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的“音频配置”对象。Create an Audio Configuration object that specifies the .WAV file name.
  • 使用上面的配置对象创建 SpeechSynthesizer 对象。Create a SpeechSynthesizer object using the configuration objects from above.
  • 使用 SpeechSynthesizer 对象,将文本转换为合成语音,并将其保存到指定的音频文件中。Using the SpeechSynthesizer object, convert your text into synthesized speech, saving it into the audio file specified.
  • 检查返回的 SpeechSynthesizer 中是否有错误。Inspect the SpeechSynthesizer returned for errors.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK Java 示例If you prefer to jump right in, view or download all Speech SDK Java Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

添加示例代码Add sample code

  1. 若要向 Java 项目添加新的空类,请选择“文件” > “新建” > “类”。 To add a new empty class to your Java project, select File > New > Class.

  2. 在“新建 Java 类”窗口中,在“包”字段内输入 speechsdk.quickstart,在“名称”字段内输入 MainIn the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    “新建 Java 类”窗口的屏幕截图

  3. Main.java 中的所有代码替换为以下代码片段:Replace all code in Main.java with the following snippet:

    package speechsdk.quickstart;
    
    import java.util.concurrent.Future;
    import com.microsoft.cognitiveservices.speech.*;
    
    /**
     * Quickstart: recognize speech using the Speech SDK for Java.
     */
    public class Main {
    
        /**
         * @param args Arguments are ignored in this sample.
         */
        public static void main(String[] args) {
            try {
                // Replace below with your own subscription key
                String speechSubscriptionKey = "YourSubscriptionKey";
    
                // Replace below with your own region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
                String serviceRegion = "YourServiceRegion";
    
                // Replace below with your own filename.
                String audioFileName = "helloworld.wav";
    
                // Replace below with your own filename.
                String text = "Hello world!";
    
                int exitCode = 1;
                SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
                assert(config != null);
    
                AudioConfig audioOutput = AudioConfig.fromWavFileInput(audioFileName);
                assert(audioOutput != null);
    
                SpeechSynthesizer synth = new SpeechSynthesizer(config, audioOutput);
                assert(synth != null);
    
                Future<SpeechSynthesisResult> task = synth.SpeakTextAsync(text);
                assert(task != null);
    
                SpeechSynthesisResult result = task.get();
                assert(result != null);
    
                if (result.getReason() == ResultReason.SynthesizingAudioCompleted) {
                    System.out.println("Speech synthesized to [" + audioFilename + "] for text [" + text + "]");
                    exitCode = 0;
                }
                else if (result.getReason() == ResultReason.Canceled) {
                    SpeechSynthesisCancellationDetails cancellation = SpeechSynthesisCancellationDetails.fromResult(result);
                    System.out.println("CANCELED: Reason=" + cancellation.getReason());
    
                    if (cancellation.getReason() == CancellationReason.Error) {
                        System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                        System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                        System.out.println("CANCELED: Did you update the subscription info?");
                    }
                }
    
                result.close();
                synth.close();
    
                System.exit(exitCode);
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
    
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. 将字符串 YourSubscriptionKey 替换为你的订阅密钥。Replace the string YourSubscriptionKey with your subscription key.

  5. 将字符串 YourServiceRegion 替换为与订阅关联的区域(例如,对于试用订阅,为 chinaeast2)。Replace the string YourServiceRegion with the region associated with your subscription (for example, chinaeast2 for the trial subscription).

  6. helloworld.wav 字符串替换为你自己的文件名。Replace the string helloworld.wav with your own filename.

  7. Hello world! 字符串替换为你自己的文本。Replace the string Hello world! with your own text.

  8. 保存对项目的更改。Save changes to the project.

生成并运行应用Build and run the app

按 F11,或选择“运行” > “调试”。 Press F11, or select Run > Debug. 文本转换为语音,并保存在指定的音频数据中。Your text is converted to speech, and saved in the audio data specified.

Speech synthesized to [helloworld.wav] for text [Hello world!]

后续步骤Next steps

了解语音合成的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech synthesis, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.


在本快速入门中,你将使用语音 SDK 将文本转换为音频文件中的合成语音。In this quickstart, you will use the Speech SDK to convert text to synthesized speech in an audio file. 文本转语音语言支持下,文本转语音服务为合成语音提供了多种选项。The text-to-speech service provides numerous options for synthesized voices, under text-to-speech language support. 在满足几项先决条件后,将语音合成到文件只需五个步骤:After satisfying a few prerequisites, synthesizing speech into a file only takes five steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的“音频配置”对象。Create an Audio Configuration object that specifies the .WAV file name.
  • 使用上面的配置对象创建 SpeechSynthesizer 对象。Create a SpeechSynthesizer object using the configuration objects from above.
  • 使用 SpeechSynthesizer 对象,将文本转换为合成语音,并将其保存到指定的音频文件中。Using the SpeechSynthesizer object, convert your text into synthesized speech, saving it into the audio file specified.
  • 检查返回的 SpeechSynthesizer 中是否有错误。Inspect the SpeechSynthesizer returned for errors.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK Python 示例If you prefer to jump right in, view or download all Speech SDK Python Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

  • 语音服务的 Azure 订阅密钥。An Azure subscription key for the Speech service. 获取一个试用版Get one for trial.
  • Python 3.5 到 3.8Python 3.5 to 3.8.
  • Python 语音 SDK 包适用于以下操作系统:The Python Speech SDK package is available for these operating systems:
    • Windows:x64 和 x86。Windows: x64 and x86.
    • Mac:macOS X 10.12 或更高版本。Mac: macOS X version 10.12 or later.
    • Linux:基于 x64 的 Ubuntu 16.04/18.04、Debian 9、RHEL 7/8、CentOS 7/8。Linux: Ubuntu 16.04/18.04, Debian 9, RHEL 7/8, CentOS 7/8 on x64.
  • 在 Linux 上,请运行以下命令安装所需的包:On Linux, run these commands to install the required packages:
  • 需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.
sudo apt-get update
sudo apt-get install build-essential libssl1.0.0 libasound2

安装语音 SDKInstall the Speech SDK

重要

下载任何 Azure 认知服务语音 SDK,即表示你已确认接受其许可条款。By downloading any of the Azure Cognitive Services Speech SDKs, you acknowledge its license. 有关详细信息,请参阅:For more information, see:

此命令从语音 SDK 的 PyPI 安装 Python 包:This command installs the Python package from PyPI for the Speech SDK:

pip install azure-cognitiveservices-speech

支持和更新Support and updates

语音 SDK Python 包的更新将通过 PyPI 分发,发行说明中会发布相关通告。Updates to the Speech SDK Python package are distributed via PyPI and announced in the Release notes. 如果有新版本可用,可以使用 pip install --upgrade azure-cognitiveservices-speech 命令进行更新。If a new version is available, you can update to it with the command pip install --upgrade azure-cognitiveservices-speech. 通过查看 azure.cognitiveservices.speech.__version__ 变量来检查当前安装的版本。Check which version is currently installed by inspecting the azure.cognitiveservices.speech.__version__ variable.

如果遇到问题或者缺少某项功能,请查看支持和帮助选项If you have a problem, or you're missing a feature, see Support and help options.

使用语音 SDK 创建 Python 应用程序Create a Python application that uses the Speech SDK

运行示例Run the sample

可将本快速入门中的示例代码复制到源文件 quickstart.py,然后在 IDE 或控制台中运行该代码You can copy the sample code from this quickstart to a source file quickstart.py and run it in your IDE or in the console:

python quickstart.py

或者,可以从语音 SDK 示例存储库Jupyter Notebook 的形式下载本快速入门教程,并将其作为 Notebook 运行。Or you can download this quickstart tutorial as a Jupyter notebook from the Speech SDK sample repository and run it as a notebook.

示例代码Sample code

import azure.cognitiveservices.speech as speechsdk

# Replace with your own subscription key and region identifier from here: https://docs.azure.cn/cognitive-services/speech-service/regions
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates an audio configuration that points to an audio file.
# Replace with your own audio filename.
audio_filename = "helloworld.wav"
audio_output = speechsdk.audio.AudioOutputConfig(filename=audio_filename)

# Creates a synthesizer with the given settings
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output)

# Synthesizes the text to speech.
# Replace with your own text.
text = "Hello world!"
result = speech_synthesizer.speak_text_async(text).get()

# Checks result.
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("Speech synthesized to [{}] for text [{}]".format(audio_filename, text))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech synthesis canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print("Error details: {}".format(cancellation_details.error_details))
    print("Did you update the subscription info?")

通过 Visual Studio Code 安装并使用语音 SDKInstall and use the Speech SDK with Visual Studio Code

  1. 在计算机上下载并安装 64 位版本的 Python(3.5 到 3.8)。Download and install a 64-bit version of Python, 3.5 to 3.8, on your computer.

  2. 下载并安装 Visual Studio CodeDownload and install Visual Studio Code.

  3. 打开 Visual Studio Code 并安装 Python 扩展。Open Visual Studio Code and install the Python extension. 在菜单中选择“文件”**** > ****“首选项” > ****“扩展”。Select File > Preferences > Extensions from the menu. 搜索 PythonSearch for Python.

    安装 Python 扩展

  4. 创建一个文件夹用于存储项目。Create a folder to store the project in. 例如,使用 Windows 资源管理器。An example is by using Windows Explorer.

  5. 在 Visual Studio Code 中选择“文件”图标。In Visual Studio Code, select the File icon. 然后打开创建的文件夹。Then open the folder you created.

    打开文件夹

  6. 选择“新建文件”图标创建新的 Python 源文件 speechsdk.pyCreate a new Python source file, speechsdk.py, by selecting the new file icon.

    创建文件

  7. 复制 Python 代码并将其粘贴到新建的文件,然后保存文件。Copy, paste, and save the Python code to the newly created file.

  8. 插入语音服务订阅信息。Insert your Speech service subscription information.

  9. 如果已选择 Python 解释器,窗口底部的状态栏左侧会显示它。If selected, a Python interpreter displays on the left side of the status bar at the bottom of the window. 否则,会显示可用 Python 解释器的列表。Otherwise, bring up a list of available Python interpreters. 打开命令面板 (Ctrl+Shift+P) 并输入 Python:Select InterpreterOpen the command palette (Ctrl+Shift+P) and enter Python: Select Interpreter. 选择适当的解释器。Choose an appropriate one.

  10. 如果尚未为所选的 Python 解释器安装,You can install the Speech SDK Python package from within Visual Studio Code. 可以从 Visual Studio Code 内部安装语音 SDK Python 包。Do that if it's not installed yet for the Python interpreter you selected. 若要安装语音 SDK 包,请打开终端。To install the Speech SDK package, open a terminal. 再次启动命令面板 (Ctrl+Shift+P) 并输入 Terminal:Create New Integrated Terminal 来打开终端。Bring up the command palette again (Ctrl+Shift+P) and enter Terminal: Create New Integrated Terminal. 在打开的终端中,输入命令 python -m pip install azure-cognitiveservices-speech,或者输入适用于系统的命令。In the terminal that opens, enter the command python -m pip install azure-cognitiveservices-speech or the appropriate command for your system.

  11. 若要运行示例代码,请在编辑器中的某个位置单击右键。To run the sample code, right-click somewhere inside the editor. 选择“在终端中运行 Python 文件”。Select Run Python File in Terminal. 文本转换为语音,并保存在指定的音频数据中。Your text is converted to speech, and saved in the audio data specified.

    Speech synthesized to [helloworld.wav] for text [Hello world!]
    

如果在遵照这些说明操作时遇到问题,请参阅内容更全面的 Visual Studio Code Python 教程If you have issues following these instructions, refer to the more extensive Visual Studio Code Python tutorial.

后续步骤Next steps

了解语音合成的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech synthesis, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

在本快速入门中,你将使用语音 SDK 将文本转换为音频文件中的合成语音。In this quickstart, you will use the Speech SDK to convert text to synthesized speech in an audio file. 文本转语音语言支持下,文本转语音服务为合成语音提供了多种选项。The text-to-speech service provides numerous options for synthesized voices, under text-to-speech language support. 在满足几项先决条件后,将语音合成到文件只需五个步骤:After satisfying a few prerequisites, synthesizing speech into a file only takes five steps:

  • 通过订阅密钥和区域创建 SpeechConfig 对象。Create a SpeechConfig object from your subscription key and region.
  • 创建指定 .WAV 文件名的“音频配置”对象。Create an Audio Configuration object that specifies the .WAV file name.
  • 使用上面的配置对象创建 SpeechSynthesizer 对象。Create a SpeechSynthesizer object using the configuration objects from above.
  • 使用 SpeechSynthesizer 对象,将文本转换为合成语音,并将其保存到指定的音频文件中。Using the SpeechSynthesizer object, convert your text into synthesized speech, saving it into the audio file specified.
  • 检查返回的 SpeechSynthesizer 中是否有错误。Inspect the SpeechSynthesizer returned for errors.

如果希望直入正题,请在 GitHub 上查看或下载所有语音 SDK JavaScript 示例If you prefer to jump right in, view or download all Speech SDK JavaScript Samples on GitHub. 否则就开始吧!Otherwise, let's get started.

先决条件Prerequisites

准备工作:Before you get started:

从一些样本代码入手Start with some boilerplate code

添加一些代码作为项目的框架。Let's add some code that works as a skeleton for our project. 创建一个 index.js 文件并添加此代码。Create an index.js file and add this code.

请确保为 subscriptionKeyservcieRegionfilename 填充值。Be sure to fill in your values for subscriptionKey, servcieRegion, and filename.

(function() {
  // <code>
  "use strict";
  
  // pull in the required packages.
  var sdk = require("microsoft-cognitiveservices-speech-sdk");
  var fs = require("fs");
  
  // replace with your own subscription key,
  // service region (e.g., "chinaeast2"), and
  // the name of the file you want to run
  // through the speech synthesizer.
  var subscriptionKey = "YourSubscriptionKey";
  var serviceRegion = "YourServiceRegion"; // e.g., "chinaeast2"
  var filename = "YourAudioFile.wav"; // 16000 Hz, Mono
 
}());
  

将该文件加载到 PullAudioOutputStreamLoad the file into an PullAudioOutputStream

对于 NodeJS,语音 SDK 本身不直接支持文件访问,因此,我们将打开文件并使用 PullAudioOutputStream 对其进行写入。For NodeJS the Speech SDK doesn't natively support file access directly, so we'll open the file and write to it using a PullAudioOutputStream.

// create the push stream we need for the speech sdk.
  var pullStream = sdk.AudioOutputStream.createPullStream();
  
  // open the file and push it to the push stream.
  fs.createWriteStream(filename).on('data', function(arrayBuffer) {
    pullStream.read(arrayBuffer.slice());
  }).on('end', function() {
    pullStream.close();
  });

创建语音配置Create a Speech configuration

在初始化 SpeechSynthesizer 对象之前,需要创建一个使用订阅密钥和订阅区域的配置。Before you can initialize a SpeechSynthesizer object, you need to create a configuration that uses your subscription key and subscription region. 接下来,插入此代码。Insert this code next.

备注

语音 SDK 将默认使用 en-us 作为语言进行识别。若要了解如何选择源语言,请参阅指定语音转文本的源语言The Speech SDK will default to recognizing using en-us for the language, see Specify source language for speech to text for information on choosing the source language.

  // now create the audio-config pointing to our stream and
 // the speech config specifying the language.
 var speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
 
 // setting the recognition language to English.
 speechConfig.speechRecognitionLanguage = "en-US";
 

创建音频配置Create an Audio configuration

现在,需要创建指向 PullAudioOutputStreamAudioConfig 对象。Now, you need to create an AudioConfig object that points to your PullAudioOutputStream. 将此代码插入语音配置下。Insert this code right below your Speech configuration.

    var audioConfig = sdk.AudioConfig.fromStreamInput(pullStream);

初始化 SpeechSynthesizerInitialize a SpeechSynthesizer

现在,使用之前创建的 SpeechConfigAudioConfig 对象创建 SpeechSynthesizer 对象。Now, let's create the SpeechSynthesizer object using the SpeechConfig and AudioConfig objects created earlier.

  // create the speech synthesizer.
  var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
  

识别短语并显示结果Recognize a phrase and display results

SpeechSynthesizer 对象中,我们将调用 speakTextAsync() 方法。From the SpeechSynthesizer object, you're going to call the speakTextAsync() method. 此方法使语音服务了解你要发送文本进行合成。This method lets the Speech service know that you're sending text for synthesis.

我们还会将返回的结果或任何错误写入到控制台,最后关闭合成器。We'll also write the returned result, or any errors, to the console and finally close the synthesizer.

 // we are done with the setup
  var text = "Hello World"
  console.log("Now sending text '" + text + "' to: " + filename);
  
  // start the synthesizer and wait for a result.
  synthesizer.speakTextAsync(
    text,
    function (result) {
      console.log(result);
  
      synthesizer.close();
      synthesizer = undefined;
    },
    function (err) {
      console.trace("err - " + err);
  
      synthesizer.close();
      synthesizer = undefined;
    },
    filename);

查看代码Check your code

(function() {
  "use strict";
  
  // pull in the required packages.
  var sdk = require("microsoft-cognitiveservices-speech-sdk");
  var fs = require("fs");
  
  // replace with your own subscription key,
  // service region (e.g., "chinaeast2"), and
  // the name of the file you want to run
  // through the speech synthesizer.
  var subscriptionKey = "YourSubscriptionKey";
  var serviceRegion = "YourServiceRegion"; // e.g., "chinaeast2"
  var filename = "YourAudioFile.wav"; // 16000 Hz, Mono
  
  // create the pull stream we need for the speech sdk.
  var pullStream = sdk.AudioOutputStream.createPullStream();
  
  // open the file and write it to the pull stream.
  fs.createWriteStream(filename).on('data', function(arrayBuffer) {
    pullStream.read(arrayBuffer.slice());
  }).on('end', function() {
    pullStream.close();
  });
 
  // now create the audio-config pointing to our stream and
  // the speech config specifying the language.
  var speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
  
  // setting the recognition language to English.
  speechConfig.speechRecognitionLanguage = "en-US";
  
  var audioConfig = sdk.AudioConfig.fromStreamOutput(pullStream);
  
  // create the speech synthesizer.
  var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);
  
 // we are done with the setup
  var text = "Hello World"
  console.log("Now sending text '" + text + "' to: " + filename);
  
  // start the synthesizer and wait for a result.
  synthesizer.speakTextAsync(
    text,
    function (result) {
      console.log(result);
  
      synthesizer.close();
      synthesizer = undefined;
    },
    function (err) {
      console.trace("err - " + err);
  
      synthesizer.close();
      synthesizer = undefined;
    },
    filename);

}());

在本地运行示例Run the sample locally

使用 NodeJs 执行代码Execute the code using NodeJs

node index.js

后续步骤Next steps

了解语音合成的这些基本知识后,请继续浏览基础知识,了解语音 SDK 中的常见功能和任务。With this base knowledge of speech synthesis, continue exploring the basics to learn about common functionality and tasks within the Speech SDK.

在本快速入门中,你将使用命令行中的语音 CLI 将文本转换为存储在音频文件中的语音。In this quickstart, you use the Speech CLI from the command line to convert text to speech stored in an audio file. 文本转语音语言支持下,文本转语音服务为合成语音提供了多种选项。The text-to-speech service provides many options for synthesized voices, under text-to-speech language support. 经过一次性配置后,可以通过语音 CLI 使用命令行中的命令从文本合成语音。After a one-time configuration, the Speech CLI lets you synthesize speech from text using commands from the command line.

先决条件Prerequisites

唯一先决条件是要有一个 Azure 语音订阅。The only prerequisite is an Azure Speech subscription. 如果还没有订阅,请参阅指南了解如何新建订阅。See the guide on creating a new subscription if you don't already have one.

下载并安装Download and install

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 安装 .NET Framework 4.7.NET Core 3.0Install either .NET Framework 4.7 or .NET Core 3.0
  2. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  3. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

创建订阅配置Create subscription config

若要开始使用语音 CLI,首先需要输入语音订阅密钥和区域信息。To start using the Speech CLI, you first need to enter your Speech subscription key and region information. 请查看区域支持页,找到你的区域标识符。See the region support page to find your region identifier. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. chinaeast2),运行以下命令。chinaeast2), run the following commands.

spx config @key --set YOUR-SUBSCRIPTION-KEY
spx config @region --set YOUR-REGION-ID

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

运行语音 CLIRun the Speech CLI

现在,可以运行语音 CLI,以将语音从文本合成到新的音频文件。Now you're ready to run the Speech CLI to synthesize speech from text into a new audio file.

在命令行中,更改为包含语音 CLI 二进制文件的目录,然后键入:From the command line, change to the directory that contains the Speech CLI binary file, and type:

spx synthesize --text "The speech synthesizer greets you!" --audio output greetings.wav

语音 CLI 将采用英文向 greetings.wav 音频文件生成自然语言。The Speech CLI will produce natural language in English into the greetings.wav audio file. 在 Windows 中,输入 start greetings.wav 可以播放音频文件。In Windows, you can play the audio file by entering start greetings.wav.

后续步骤Next steps

继续浏览基础知识,了解语音 CLI 的其他功能。Continue exploring the basics to learn about other features of the Speech CLI.

查看或下载 GitHub 上所有的语音 SDK 示例View or download all Speech SDK Samples on GitHub.

其他语言和平台支持Additional language and platform support

重要

需要语音 SDK 版本 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

如果已单击此选项卡,则可能看不到采用你偏好的编程语言的快速入门。If you've clicked this tab, you probably didn't see a quickstart in your favorite programming language. 别担心,我们在 GitHub 上提供了其他快速入门材料和代码示例。Don't worry, we have additional quickstart materials and code samples available on GitHub. 使用表格查找适用于编程语言和平台/OS 组合的相应示例。Use the table to find the right sample for your programming language and platform/OS combination.

语言Language 其他快速入门Additional Quickstarts 代码示例Code samples
C#C# 到扬声器To a speaker .NET Framework.NET CoreUWPUnityXamarin.NET Framework, .NET Core, UWP, Unity, Xamarin
C++C++ 到扬声器To a speaker WindowsLinuxmacOSWindows, Linux, macOS
JavaJava 到扬声器To a speaker AndroidJREAndroid, JRE
JavascriptJavaScript Node.js 到音频文件Node.js to an audio file Windows、Linux 和 macOSWindows, Linux, macOS
Objective-CObjective-C iOS 到扬声器macOS 到扬声器iOS to speaker, macOS to speaker iOSmacOSiOS, macOS
PythonPython 到扬声器To a speaker Windows、Linux 和 macOSWindows, Linux, macOS
SwiftSwift iOS 到扬声器macOS 到扬声器iOS to speaker, macOS to speaker iOSmacOSiOS, macOS