文本转语音 REST APIText-to-speech REST API

语音服务可让你使用一组 REST API 将文本转换为合成语音,并获取某个区域支持的语音列表The Speech service allows you to convert text into synthesized speech and get a list of supported voices for a region using a set of REST APIs. 每个可用终结点都与一个区域相关联。Each available endpoint is associated with a region. 需要计划使用的终结点/区域的订阅密钥。A subscription key for the endpoint/region you plan to use is required.

文本转语音 REST API 支持神经和标准文本转语音,每种语音支持区域设置标识的特定语言和方言。The text-to-speech REST API supports neural and standard text-to-speech voices, each of which supports a specific language and dialect, identified by locale.

  • 有关语音的完整列表,请参阅语言支持For a complete list of voices, see language support.
  • 有关区域可用性的信息,请参阅区域For information about regional availability, see regions.

重要

标准语音和神经语音的费用各不相同。Costs vary for standard and neural voices. 有关详细信息,请参阅定价For more information, see Pricing.

使用此 API 之前,请了解:Before using this API, understand:

  • 文本转语音 REST API 需要授权标头。The text-to-speech REST API requires an Authorization header. 这意味着,需要完成令牌交换才能访问该服务。This means that you need to complete a token exchange to access the service. 有关详细信息,请参阅身份验证For more information, see Authentication.

AuthenticationAuthentication

每个请求都需要一个授权标头。Each request requires an authorization header. 下表列出了每个服务支持的标头:This table illustrates which headers are supported for each service:

支持的授权标头Supported authorization headers 语音转文本Speech-to-text 文本转语音Text-to-speech
Ocp-Apim-Subscription-KeyOcp-Apim-Subscription-Key Yes No
授权:持有者Authorization: Bearer Yes Yes

使用 Ocp-Apim-Subscription-Key 标头时,只需提供订阅密钥。When using the Ocp-Apim-Subscription-Key header, you're only required to provide your subscription key. 例如:For example:

'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'

使用 Authorization: Bearer 标头时,需要向 issueToken 终结点发出请求。When using the Authorization: Bearer header, you're required to make a request to the issueToken endpoint. 在此请求中,交换有效期为 10 分钟的访问令牌的订阅密钥。In this request, you exchange your subscription key for an access token that's valid for 10 minutes. 下面的几个部分将介绍如何获取令牌、使用令牌。In the next few sections you'll learn how to get a token, and use a token.

如何获取访问令牌How to get an access token

若要获取访问令牌,需使用 Ocp-Apim-Subscription-Key 和订阅密钥向 issueToken 终结点发出请求。To get an access token, you'll need to make a request to the issueToken endpoint using the Ocp-Apim-Subscription-Key and your subscription key.

issueToken 终结点具有以下格式:The issueToken endpoint has this format:

https://<REGION_IDENTIFIER>.api.cognitive.azure.cn/sts/v1.0/issueToken

<REGION_IDENTIFIER> 替换为与下表中的订阅区域匹配的标识符:Replace <REGION_IDENTIFIER> with the identifier matching the region of your subscription from this table:

地理位置Geography 区域Region 区域标识符Region identifier
中国China 中国东部 2China East 2 chinaeast2

使用这些示例创建访问令牌请求。Use these samples to create your access token request.

HTTP 示例HTTP sample

此示例是获取令牌的简单 HTTP 请求。This example is a simple HTTP request to get a token. 请将 YOUR_SUBSCRIPTION_KEY 替换为语音服务订阅密钥。Replace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. 如果订阅不在美国西部区域,请将 Host 标头替换为所在区域的主机名。If your subscription isn't in the West US region, replace the Host header with your region's host name.

POST /sts/v1.0/issueToken HTTP/1.1
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
Host: chinaeast2.api.cognitive.azure.cn
Content-type: application/x-www-form-urlencoded
Content-Length: 0

响应正文包含 JSON Web 令牌 (JWT) 格式的访问令牌。The body of the response contains the access token in JSON Web Token (JWT) format.

PowerShell 示例PowerShell sample

此示例是获取访问令牌的简单 PowerShell 脚本。This example is a simple PowerShell script to get an access token. 请将 YOUR_SUBSCRIPTION_KEY 替换为语音服务订阅密钥。Replace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. 请务必使用与订阅匹配的正确区域终结点。Make sure to use the correct endpoint for the region that matches your subscription. 此示例目前设置为“美国西部”。This example is currently set to West US.

$FetchTokenHeader = @{
  'Content-type'='application/x-www-form-urlencoded';
  'Content-Length'= '0';
  'Ocp-Apim-Subscription-Key' = 'YOUR_SUBSCRIPTION_KEY'
}

$OAuthToken = Invoke-RestMethod -Method POST -Uri https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issueToken
 -Headers $FetchTokenHeader

# show the token received
$OAuthToken

cURL 示例cURL sample

cURL 是 Linux(及面向 Linux 的 Windows 子系统)中提供的一种命令行工具。cURL is a command-line tool available in Linux (and in the Windows Subsystem for Linux). 此 cURL 命令演示如何获取访问令牌。This cURL command illustrates how to get an access token. 请将 YOUR_SUBSCRIPTION_KEY 替换为语音服务订阅密钥。Replace YOUR_SUBSCRIPTION_KEY with your Speech Service subscription key. 请务必使用与订阅匹配的正确区域终结点。Make sure to use the correct endpoint for the region that matches your subscription. 此示例目前设置为“美国西部”。This example is currently set to West US.

curl -v -X POST
 "https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issueToken" \
 -H "Content-type: application/x-www-form-urlencoded" \
 -H "Content-Length: 0" \
 -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY"

C# 示例C# sample

此 C# 类演示如何获取访问令牌。This C# class illustrates how to get an access token. 实例化该类时,请传递语音服务订阅密钥。Pass your Speech Service subscription key when you instantiate the class. 如果订阅不在美国西部区域,请更改 FetchTokenUri 的值,以便与订阅的区域相匹配。If your subscription isn't in the West US region, change the value of FetchTokenUri to match the region for your subscription.

public class Authentication
{
    public static readonly string FetchTokenUri =
        "https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issueToken";
    private string subscriptionKey;
    private string token;

    public Authentication(string subscriptionKey)
    {
        this.subscriptionKey = subscriptionKey;
        this.token = FetchTokenAsync(FetchTokenUri, subscriptionKey).Result;
    }

    public string GetAccessToken()
    {
        return this.token;
    }

    private async Task<string> FetchTokenAsync(string fetchUri, string subscriptionKey)
    {
        using (var client = new HttpClient())
        {
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
            UriBuilder uriBuilder = new UriBuilder(fetchUri);

            var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
            Console.WriteLine("Token Uri: {0}", uriBuilder.Uri.AbsoluteUri);
            return await result.Content.ReadAsStringAsync();
        }
    }
}

Python 示例Python sample

# Request module must be installed.
# Run pip install requests if necessary.
import requests

subscription_key = 'REPLACE_WITH_YOUR_KEY'


def get_token(subscription_key):
    fetch_token_url = 'https://chinaeast2.api.cognitive.azure.cn/sts/v1.0/issueToken'
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key
    }
    response = requests.post(fetch_token_url, headers=headers)
    access_token = str(response.text)
    print(access_token)

如何使用访问令牌How to use an access token

应将访问令牌作为 Authorization: Bearer <TOKEN> 标头发送到服务。The access token should be sent to the service as the Authorization: Bearer <TOKEN> header. 每个访问令牌的有效期为 10 分钟。Each access token is valid for 10 minutes. 随时可以获取新令牌,但是,为了最大限度地减少流量和延迟,我们建议使用同一令牌 9 分钟。You can get a new token at any time, however, to minimize network traffic and latency, we recommend using the same token for nine minutes.

下面是向文本转语音 REST API 发出的示例 HTTP 请求:Here's a sample HTTP request to the text-to-speech REST API:

POST /cognitiveservices/v1 HTTP/1.1
Authorization: Bearer YOUR_ACCESS_TOKEN
Host: chinaeast2.stt.speech.azure.cn
Content-type: application/ssml+xml
Content-Length: 199
Connection: Keep-Alive

// Message body here...

获取语音列表Get a list of voices

voices/list 终结点允许你获取特定区域/终结点的完整语音列表。The voices/list endpoint allows you to get a full list of voices for a specific region/endpoint.

区域和终结点Regions and endpoints

区域Region 终结点Endpoint
中国东部 2China East 2 https://chinaeast2.tts.speech.azure.cn/cognitiveservices/voices/list

请求标头Request headers

下表列出了文本转语音请求的必需和可选标头。This table lists required and optional headers for text-to-speech requests.

标头Header 说明Description 必需/可选Required / Optional
Authorization 前面带有单词 Bearer 的授权令牌。An authorization token preceded by the word Bearer. 有关详细信息,请参阅身份验证For more information, see Authentication. 必须Required

请求正文Request body

对此终结点的 GET 请求不需要正文。A body isn't required for GET requests to this endpoint.

示例请求Sample request

此请求仅需要授权标头。This request only requires an authorization header.

GET /cognitiveservices/voices/list HTTP/1.1

Host: chinaeast2.tts.speech.azure.cn
Authorization: Bearer [Base64 access_token]

示例响应Sample response

为说明响应的结构,已将此响应截断。This response has been truncated to illustrate the structure of a response.

备注

语音可用性因区域/终结点而异。Voice availability varies by region/endpoint.

[
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ar-EG, Hoda)",
        "ShortName": "ar-EG-Hoda",
        "Gender": "Female",
        "Locale": "ar-EG",
        "SampleRateHertz": "16000",
        "VoiceType": "Standard"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ar-SA, Naayf)",
        "ShortName": "ar-SA-Naayf",
        "Gender": "Male",
        "Locale": "ar-SA",
        "SampleRateHertz": "16000",
        "VoiceType": "Standard"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (bg-BG, Ivan)",
        "ShortName": "bg-BG-Ivan",
        "Gender": "Male",
        "Locale": "bg-BG",
        "SampleRateHertz": "16000",
        "VoiceType": "Standard"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (ca-ES, HerenaRUS)",
        "ShortName": "ca-ES-HerenaRUS",
        "Gender": "Female",
        "Locale": "ca-ES",
        "SampleRateHertz": "16000",
        "VoiceType": "Standard"
    },
    {
        "Name": "Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)",
        "ShortName": "zh-CN-XiaoxiaoNeural",
        "Gender": "Female",
        "Locale": "zh-CN",
        "SampleRateHertz": "24000",
        "VoiceType": "Neural"
    },

    ...
]

HTTP 状态代码HTTP status codes

每个响应的 HTTP 状态代码指示成功或一般错误。The HTTP status code for each response indicates success or common errors.

HTTP 状态代码HTTP status code 说明Description 可能的原因Possible reason
200200 OKOK 请求已成功。The request was successful.
400400 错误的请求Bad Request 必需参数缺失、为空或为 null。A required parameter is missing, empty, or null. 或者,传递给必需参数或可选参数的值无效。Or, the value passed to either a required or optional parameter is invalid. 常见问题是标头太长。A common issue is a header that is too long.
401401 未授权Unauthorized 请求未经授权。The request is not authorized. 确保订阅密钥或令牌有效并在正确的区域中。Check to make sure your subscription key or token is valid and in the correct region.
429429 请求过多Too Many Requests 已经超过了订阅允许的配额或请求速率。You have exceeded the quota or rate of requests allowed for your subscription.
502502 错误的网关Bad Gateway 网络或服务器端问题。Network or server-side issue. 也可能表示标头无效。May also indicate invalid headers.

将文本转换到语音Convert text-to-speech

v1 终结点允许你使用语音合成标记语言 (SSML) 将文本转换为语音。The v1 endpoint allows you to convert text-to-speech using Speech Synthesis Markup Language (SSML).

区域和终结点Regions and endpoints

使用 REST API 的文本转语音支持以下区域。These regions are supported for text-to-speech using the REST API. 请务必选择与订阅区域匹配的终结点。Make sure that you select the endpoint that matches your subscription region.

标准语音和神经语音Standard and neural voices

使用下表按区域/终结点确定标准语音和神经语音的可用性:Use this table to determine availability of standard and neural voices by region/endpoint:

区域Region 端点Endpoint 标准语音Standard Voices 神经语音Neural Voices
中国东部 2China East 2 https://chinaeast2.tts.speech.azure.cn/cognitiveservices/v1 Yes Yes

请求标头Request headers

下表列出了文本转语音请求的必需和可选标头。This table lists required and optional headers for text-to-speech requests.

标头Header 说明Description 必需/可选Required / Optional
Authorization 前面带有单词 Bearer 的授权令牌。An authorization token preceded by the word Bearer. 有关详细信息,请参阅身份验证For more information, see Authentication. 必须Required
Content-Type 指定所提供的文本的内容类型。Specifies the content type for the provided text. 接受的值:application/ssml+xmlAccepted value: application/ssml+xml. 必须Required
X-Microsoft-OutputFormat 指定音频输出格式。Specifies the audio output format. 有关接受值的完整列表,请参阅音频输出For a complete list of accepted values, see audio outputs. 必须Required
User-Agent 应用程序名称。The application name. 提供的值必须少于 255 个字符。The value provided must be less than 255 characters. 必须Required

音频输出Audio outputs

这是在每个请求中作为 X-Microsoft-OutputFormat 标头发送的受支持音频格式的列表。This is a list of supported audio formats that are sent in each request as the X-Microsoft-OutputFormat header. 每种格式合并了比特率和编码类型。Each incorporates a bitrate and encoding type. 语音服务支持 24 kHz、16 kHz 和 8 kHz 音频输出。The Speech service supports 24 kHz, 16 kHz, and 8 kHz audio outputs.

raw-16khz-16bit-mono-pcm raw-8khz-8bit-mono-mulaw
riff-8khz-8bit-mono-alaw riff-8khz-8bit-mono-mulaw
riff-16khz-16bit-mono-pcm audio-16khz-128kbitrate-mono-mp3
audio-16khz-64kbitrate-mono-mp3 audio-16khz-32kbitrate-mono-mp3
raw-24khz-16bit-mono-pcm riff-24khz-16bit-mono-pcm
audio-24khz-160kbitrate-mono-mp3 audio-24khz-96kbitrate-mono-mp3
audio-24khz-48kbitrate-mono-mp3

备注

如果所选语音和输出格式具有不同的比特率,则根据需要对音频重新采样。If your selected voice and output format have different bit rates, the audio is resampled as necessary. 但是,24 kHz 语音不支持 audio-16khz-16kbps-mono-sirenriff-16khz-16kbps-mono-siren 输出格式。However, 24 kHz voices do not support audio-16khz-16kbps-mono-siren and riff-16khz-16kbps-mono-siren output formats.

请求正文Request body

每个 POST 请求的正文作为语音合成标记语言 (SSML) 发送。The body of each POST request is sent as Speech Synthesis Markup Language (SSML). SSML 允许选择文本到语音转换服务返回的合成语音的语音和语言。SSML allows you to choose the voice and language of the synthesized speech returned by the text-to-speech service. 有关受支持的语音的完整列表,请参阅语言支持For a complete list of supported voices, see language support.

示例请求Sample request

此 HTTP 请求使用 SSML 指定语音和语言。This HTTP request uses SSML to specify the voice and language. 正文不能超过 1,000 个字符。The body cannot exceed 1,000 characters.

POST /cognitiveservices/v1 HTTP/1.1

X-Microsoft-OutputFormat: raw-16khz-16bit-mono-pcm
Content-Type: application/ssml+xml
Host: chinaeast2.tts.speech.azure.cn
Content-Length: 225
Authorization: Bearer [Base64 access_token]

<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female'
    name='en-US-AriaRUS'>
        Microsoft Speech Service Text-to-Speech API
</voice></speak>

有关特定于语言的示例,请参阅快速入门:See our quickstarts for language-specific examples:

HTTP 状态代码HTTP status codes

每个响应的 HTTP 状态代码指示成功或一般错误。The HTTP status code for each response indicates success or common errors.

HTTP 状态代码HTTP status code 说明Description 可能的原因Possible reason
200200 OKOK 请求成功;响应正文是一个音频文件。The request was successful; the response body is an audio file.
400400 错误的请求Bad Request 必需参数缺失、为空或为 null。A required parameter is missing, empty, or null. 或者,传递给必需参数或可选参数的值无效。Or, the value passed to either a required or optional parameter is invalid. 常见问题是标头太长。A common issue is a header that is too long.
401401 未授权Unauthorized 请求未经授权。The request is not authorized. 确保订阅密钥或令牌有效并在正确的区域中。Check to make sure your subscription key or token is valid and in the correct region.
413413 请求实体太大Request Entity Too Large SSML 输入超过了 1024 个字符。The SSML input is longer than 1024 characters.
415415 不支持的媒体类型Unsupported Media Type 可能是提供了错误的 Content-TypeIt's possible that the wrong Content-Type was provided. Content-Type 应设置为 application/ssml+xmlContent-Type should be set to application/ssml+xml.
429429 请求过多Too Many Requests 已经超过了订阅允许的配额或请求速率。You have exceeded the quota or rate of requests allowed for your subscription.
502502 错误的网关Bad Gateway 网络或服务器端问题。Network or server-side issue. 也可能表示标头无效。May also indicate invalid headers.

如果 HTTP 状态为 200 OK,则响应正文包含采用所请求格式的音频文件。If the HTTP status is 200 OK, the body of the response contains an audio file in the requested format. 可以一边传输一边播放此文件,或者将其保存到缓冲区或文件中。This file can be played as it's transferred, saved to a buffer, or saved to a file.

后续步骤Next steps