快速入门:使用 Node.js 将文本转换为语音Quickstart: Convert text-to-speech using Node.js

本快速入门介绍了如何使用 Node.js 和文本转语音 REST API 将文本转换为语音。In this quickstart, you'll learn how to convert text-to-speech using Node.js and the text-to-speech REST API. 本指南中包含的请求正文以语音合成标记语言 (SSML) 的形式构造,这样你就可以选择响应的语音和语言。The request body in this guide is structured as Speech Synthesis Markup Language (SSML), which allows you to choose the voice and language of the response.

此快速入门需要包含语音服务资源的 Azure 认知服务帐户This quickstart requires an Azure Cognitive Services account with a Speech service resource. 如果没有帐户,可以使用试用帐户获取订阅密钥。If you don't have an account, you can use the trial to get a subscription key.

先决条件Prerequisites

本快速入门需要:This quickstart requires:

创建项目并声明需要的依赖项Create a project and require dependencies

在你喜欢使用的 IDE 或编辑器中新建一个 Node.js 项目。Create a new Node.js project using your favorite IDE or editor. 然后,将此代码片段复制到项目的名为 tts.js 的文件中。Then copy this code snippet into your project in a file named tts.js.

// Requires request and request-promise for HTTP requests
// e.g. npm install request request-promise
const rp = require('request-promise');
// Requires fs to write synthesized speech to a file
const fs = require('fs');
// Requires readline-sync to read command line inputs
const readline = require('readline-sync');
// Requires xmlbuilder to build the SSML body
const xmlbuilder = require('xmlbuilder');

Note

如果尚未使用这些模块,则需在运行程序之前安装它们。If you haven't used these modules you'll need to install them before running your program. 若要安装这些包,请运行 npm install request request-promise xmlbuilder readline-syncTo install these packages, run: npm install request request-promise xmlbuilder readline-sync.

获取访问令牌Get an access token

文本转语音 REST API 需要使用访问令牌进行身份验证。The text-to-speech REST API requires an access token for authentication. 若要获取访问令牌,需要进行交换。To get an access token, an exchange is required. 此函数通过 issueToken 终结点使用语音服务订阅密钥来交换访问令牌。This function exchanges your Speech service subscription key for an access token using the issueToken endpoint.

此示例假定语音服务订阅位于“中国东部 2”区域。This sample assumes that your Speech service subscription is in the China East 2 region. 如果使用其他区域,请更新 uri 的值。If you're using a different region, update the value for uri. 如需完整的列表,请参阅区域For a full list, see Regions.

将以下代码复制到项目中:Copy this code into your project:

// Gets an access token.
function getAccessToken(subscriptionKey) {
    let options = {
        method: 'POST',
        uri: 'https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issueToken',
        headers: {
            'Ocp-Apim-Subscription-Key': subscriptionKey
        }
    }
    return rp(options);
}

在下一部分中,我们将创建函数来调用文本转语音 API 并保存合成的语音响应。In the next section, we'll create the function to call the text-to-speech API and save the synthesized speech response.

发出请求并保存响应Make a request and save the response

此处,我们将生成对文本转语音 API 的请求并保存语音响应。Here you're going to build the request to the text-to-speech API and save the speech response. 此示例假设使用“中国东部 2”终结点。This sample assumes you're using the China East 2 endpoint. 如果已将资源注册到其他区域,请务必更新 uriIf your resource is registered to a different region, make sure you update the uri. 有关详细信息,请参阅语音服务区域For more information, see Speech service regions.

接下来,需要为请求添加所需的标头。Next, you need to add required headers for the request. 请务必使用资源的名称(在 Azure 门户中可以找到)更新 User-Agent,并将 X-Microsoft-OutputFormat 设置为首选的音频输出。Make sure that you update User-Agent with the name of your resource (located in the Azure portal), and set X-Microsoft-OutputFormat to your preferred audio output. 如需输出格式的完整列表,请参阅音频输出For a full list of output formats, see Audio outputs.

然后,使用语音合成标记语言 (SSML) 构造请求正文。Then construct the request body using Speech Synthesis Markup Language (SSML). 此示例将定义结构,并使用前面创建的 text 输入。This sample defines the structure, and uses the text input you created earlier.

Note

此示例使用 JessaRUS 语音字体。This sample uses the JessaRUS voice font. 如需 Microsoft 提供的语音/语言的完整列表,请参阅语言支持For a complete list of Microsoft provided voices/languages, see Language support.

最后,向服务发出请求。Finally, you'll make a request to the service. 如果该请求成功,则会返回 200 状态代码,语音响应将写入为 TTSOutput.wavIf the request is successful, and a 200 status code is returned, the speech response is written as TTSOutput.wav.

// Make sure to update User-Agent with the name of your resource.
// You can also change the voice and output formats. See:
// https://docs.azure.cn/cognitive-services/speech-service/language-support#text-to-speech
function textToSpeech(accessToken, text) {
    // Create the SSML request.
    let xml_body = xmlbuilder.create('speak')
        .att('version', '1.0')
        .att('xml:lang', 'en-us')
        .ele('voice')
        .att('xml:lang', 'en-us')
        .att('name', 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)')
        .txt(text)
        .end();
    // Convert the XML into a string to send in the TTS request.
    let body = xml_body.toString();

    let options = {
        method: 'POST',
        baseUrl: 'https://chinaeast2.tts.speech.azure.cn/',
        url: 'cognitiveservices/v1',
        headers: {
            'Authorization': 'Bearer ' + accessToken,
            'cache-control': 'no-cache',
            'User-Agent': 'YOUR_RESOURCE_NAME',
            'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm',
            'Content-Type': 'application/ssml+xml'
        },
        body: body
    }

    let request = rp(options)
        .on('response', (response) => {
            if (response.statusCode === 200) {
                request.pipe(fs.createWriteStream('TTSOutput.wav'));
                console.log('\nYour file is ready.\n')
            }
        });
    return request;
}

将其放在一起Put it all together

即将完成。You're almost done. 最后一步是创建异步函数。The last step is to create an asynchronous function. 此函数会从环境变量中读取订阅密钥,提示你输入文本,接着获取令牌,等待请求完成,然后将文本转换为语音,再将音频另存为 .wav。This function will read your subscription key from an environment variable, prompt for text, get a token, wait for the request to complete, then convert the text-to-speech and save the audio as a .wav.

如果不熟悉环境变量,或者首选在测试时将订阅密钥硬编码为字符串,请将 process.env.SPEECH_SERVICE_KEY 替换为字符串形式的订阅密钥。If you're unfamiliar with environment variables or prefer to test with your subscription key hardcoded as a string, replace process.env.SPEECH_SERVICE_KEY with your subscription key as a string.

// Use async and await to get the token before attempting
// to convert text to speech.
async function main() {
    // Reads subscription key from env variable.
    // You can replace this with a string containing your subscription key. If
    // you prefer not to read from an env variable.
    // e.g. const subscriptionKey = "your_key_here";
    const subscriptionKey = process.env.SPEECH_SERVICE_KEY;
    if (!subscriptionKey) {
        throw new Error('Environment variable for your subscription key is not set.')
    };
    // Prompts the user to input text.
    const text = readline.question('What would you like to convert to speech? ');

    try {
        const accessToken = await getAccessToken(subscriptionKey);
        await textToSpeech(accessToken, text);
    } catch (err) {
        console.log(`Something went wrong: ${err}`);
    }
}

main()

运行示例应用Run the sample app

上述操作完成后,可以运行文本转语音示例应用。That's it, you're ready to run your text-to-speech sample app. 从命令行(或终端会话)导航到项目目录,然后运行以下命令:From the command line (or terminal session), navigate to your project directory and run:

node tts.js

出现提示时,键入要转换为语音的任何文本内容。When prompted, type in whatever you'd like to convert from text-to-speech. 如果转换成功,则项目文件夹中会出现语音文件。If successful, the speech file is located in your project folder. 使用偏好的媒体播放器播放该文件。Play it using your favorite media player.

清理资源Clean up resources

请务必删除示例应用的源代码中的机密信息,例如订阅密钥。Make sure to remove any confidential information from your sample app's source code, like subscription keys.

后续步骤Next steps

另请参阅See also