长音频 API(预览)Long Audio API (Preview)

长音频 API 专为长格式文本转语音(例如有声读物、新闻文章和文档)的异步合成而设计。The Long Audio API is designed for asynchronous synthesis of long-form text to speech (for example: audio books, news articles and documents). 此 API 不会实时返回合成音频,而是期望你轮询响应并消耗输出,因为服务中提供了这些响应和输出。This API doesn't return synthesized audio in real-time, instead the expectation is that you will poll for the response(s) and consume the output(s) as they are made available from the service. 与语音 SDK 使用的文本转语音 API 不同,长音频 API 可以创建超过 10 分钟的合成音频,使其成为发布商和音频内容平台的的理想之选。Unlike the text to speech API that's used by the Speech SDK, the Long Audio API can create synthesized audio longer than 10 minutes, making it ideal for publishers and audio content platforms.

长音频 API 的其他好处:Additional benefits of the Long Audio API:

  • 服务返回的合成语音使用最佳的神经网络语音。Synthesized speech returned by the service uses the best neural voices.
  • 无需部署语音终结点,因为它可以在非实时批处理模式下合成语音。There's no need to deploy a voice endpoint as it synthesizes voices in none real-time batch mode.

备注

长音频 API 现在支持公共神经网络语音The Long Audio API now supports both Public Neural Voices.

工作流Workflow

通常,使用长音频 API 时,需要提交要合成的文本文件并轮询状态,如果状态成功,则可以下载音频输出。Typically, when using the Long Audio API, you'll submit a text file or files to be synthesized, poll for the status, then if the status is successful, you can download the audio output.

此图高度概括了该工作流。This diagram provides a high-level overview of the workflow.

长音频 API 工作流关系图

为合成准备内容Prepare content for synthesis

准备文本文件时,请确保:When preparing your text file, make sure it:

备注

对于中文(大陆)、中文(香港特别行政区)、中文(台湾)、日语和韩语,一个字将计为两个字符。For Chinese (Mainland), Chinese (Hong Kong SAR), Chinese (Taiwan), Japanese, and Korean, one word will be counted as two characters.

Python 示例Python example

本部分包含的 Python 示例演示了长音频 API 的基本用法。This section contains Python examples that show the basic usage of the Long Audio API. 使用最喜欢的 IDE 或编辑器创建新的 Python 项目。Create a new Python project using your favorite IDE or editor. 然后将以下代码片段复制到名为 voice_synthesis_client.py 的文件中。Then copy this code snippet into a file named voice_synthesis_client.py.

import argparse
import json
import ntpath
import urllib3
import requests
import time
from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

这些库用于分析参数、构造 HTTP 请求以及调用文本转语音长音频 REST API。These libraries are used to parse arguments, construct the HTTP request, and call the text-to-speech long audio REST API.

获取受支持语音列表Get a list of supported voices

以下代码允许你获取可使用的特定区域/终结点的完整语音列表。This code allows you to get a full list of voices for a specific region/endpoint that you can use. 将此代码添加到 voice_synthesis_client.py 中:Add the code to voice_synthesis_client.py:

parser = argparse.ArgumentParser(description='Text-to-speech client tool to submit voice synthesis requests.')
parser.add_argument('--voices', action="store_true", default=False, help='print voice list')
parser.add_argument('-key', action="store", dest="key", required=True, help='the speech subscription key, like fg1f763i01d94768bda32u7a******** ')
parser.add_argument('-region', action="store", dest="region", required=True, help='the region information, could be centralindia, canadacentral or uksouth')
args = parser.parse_args()
baseAddress = 'https://%s.customvoice.api.speech.azure.cn/api/texttospeech/v3.0-beta1/' % args.region

def getVoices():
    response=requests.get(baseAddress+"voicesynthesis/voices", headers={"Ocp-Apim-Subscription-Key":args.key}, verify=False)
    voices = json.loads(response.text)
    return voices

if args.voices:
    voices = getVoices()
    print("There are %d voices available:" % len(voices))
    for voice in voices:
        print ("Name: %s, Description: %s, Id: %s, Locale: %s, Gender: %s, PublicVoice: %s, Created: %s" % (voice['name'], voice['description'], voice['id'], voice['locale'], voice['gender'], voice['isPublicVoice'], voice['created']))

使用命令 python voice_synthesis_client.py --voices -key <your_key> -region <region> 运行该脚本,并替换以下值:Run the script using the command python voice_synthesis_client.py --voices -key <your_key> -region <region>, and replace the following values:

  • <your_key> 替换为语音服务订阅密钥。Replace <your_key> with your Speech service subscription key. Azure 门户中资源的“概述”选项卡内提供了此信息。This information is available in the Overview tab for your resource in the Azure portal.
  • <region> 替换为创建语音资源的区域(例如:chinaeastchinanorth)。Replace <region> with the region where your Speech resource was created (for example: chinaeast or chinanorth). Azure 门户中资源的“概述”选项卡内提供了此信息。This information is available in the Overview tab for your resource in the Azure portal.

你将看到如下所示的输出:You'll see an output that looks like this:

There are xx voices available:

Name: Microsoft Server Speech Text to Speech Voice (en-US, xxx), Description: xxx , Id: xxx, Locale: en-US, Gender: Male, PublicVoice: xxx, Created: 2019-07-22T09:38:14Z
Name: Microsoft Server Speech Text to Speech Voice (zh-CN, xxx), Description: xxx , Id: xxx, Locale: zh-CN, Gender: Female, PublicVoice: xxx, Created: 2019-08-26T04:55:39Z

如果 PublicVoice 参数为 True,则语音为公共神经语音。If PublicVoice parameter is True, the voice is public neural voice. 否则,它将是自定义神经声音。Otherwise, it's custom neural voice.

将文本转换为语音Convert text to speech

在纯文本或 SSML 文本中准备输入文本文件,然后将以下代码添加到 voice_synthesis_client.pyPrepare an input text file, in either plain text or SSML text, then add the following code to voice_synthesis_client.py:

备注

“concatenateResult”是一个可选参数。'concatenateResult' is an optional parameter. 如果未设置此参数,则将按段落生成音频输出。If this parameter isn't set, the audio outputs will be generated per paragraph. 你还可以通过设置该参数,将音频连接成 1 个输出。You can also concatenate the audios into 1 output by setting the parameter. 默认情况下,音频输出设置为 riff-16khz-16bit-mono-pcm。By default, the audio output is set to riff-16khz-16bit-mono-pcm. 有关支持的音频输出的详细信息,请参阅音频输出格式For more information about supported audio outputs, see Audio output formats.

parser.add_argument('--submit', action="store_true", default=False, help='submit a synthesis request')
parser.add_argument('--concatenateResult', action="store_true", default=False, help='If concatenate result in a single wave file')
parser.add_argument('-file', action="store", dest="file", help='the input text script file path')
parser.add_argument('-voiceId', action="store", nargs='+', dest="voiceId", help='the id of the voice which used to synthesis')
parser.add_argument('-locale', action="store", dest="locale", help='the locale information like zh-CN/en-US')
parser.add_argument('-format', action="store", dest="format", default='riff-16khz-16bit-mono-pcm', help='the output audio format')

def submitSynthesis():
    modelList = args.voiceId
    data={'name': 'simple test', 'description': 'desc...', 'models': json.dumps(modelList), 'locale': args.locale, 'outputformat': args.format}
    if args.concatenateResult:
        properties={'ConcatenateResult': 'true'}
        data['properties'] = json.dumps(properties)
    if args.file is not None:
        scriptfilename=ntpath.basename(args.file)
        files = {'script': (scriptfilename, open(args.file, 'rb'), 'text/plain')}
    response = requests.post(baseAddress+"voicesynthesis", data, headers={"Ocp-Apim-Subscription-Key":args.key}, files=files, verify=False)
    if response.status_code == 202:
        location = response.headers['Location']
        id = location.split("/")[-1]
        print("Submit synthesis request successful")
        return id
    else:
        print("Submit synthesis request failed")
        print("response.status_code: %d" % response.status_code)
        print("response.text: %s" % response.text)
        return 0

def getSubmittedSynthesis(id):
    response=requests.get(baseAddress+"voicesynthesis/"+id, headers={"Ocp-Apim-Subscription-Key":args.key}, verify=False)
    synthesis = json.loads(response.text)
    return synthesis

if args.submit:
    id = submitSynthesis()
    if (id == 0):
        exit(1)

    while(1):
        print("\r\nChecking status")
        synthesis=getSubmittedSynthesis(id)
        if synthesis['status'] == "Succeeded":
            r = requests.get(synthesis['resultsUrl'])
            filename=id + ".zip"
            with open(filename, 'wb') as f:  
                f.write(r.content)
                print("Succeeded... Result file downloaded : " + filename)
            break
        elif synthesis['status'] == "Failed":
            print("Failed...")
            break
        elif synthesis['status'] == "Running":
            print("Running...")
        elif synthesis['status'] == "NotStarted":
            print("NotStarted...")
        time.sleep(10)

使用命令 python voice_synthesis_client.py --submit -key <your_key> -region <region> -file <input> -locale <locale> -voiceId <voice_guid> 运行该脚本,并替换以下值:Run the script using the command python voice_synthesis_client.py --submit -key <your_key> -region <region> -file <input> -locale <locale> -voiceId <voice_guid>, and replace the following values:

  • <your_key> 替换为语音服务订阅密钥。Replace <your_key> with your Speech service subscription key. Azure 门户中资源的“概述”选项卡内提供了此信息。This information is available in the Overview tab for your resource in the Azure portal.
  • <region> 替换为创建语音资源的区域(例如:chinaeastchinanorth)。Replace <region> with the region where your Speech resource was created (for example: chinaeast or chinanorth). Azure 门户中资源的“概述”选项卡内提供了此信息。This information is available in the Overview tab for your resource in the Azure portal.
  • <input> 替换为准备进行文本转语音的文本文件的路径。Replace <input> with the path to the text file you've prepared for text-to-speech.
  • <locale> 替换为所需的输出区域设置。Replace <locale> with the desired output locale. 有关详细信息,请参阅语言支持For more information, see language support.
  • <voice_guid> 替换为所需的输出语音。Replace <voice_guid> with the desired output voice. 使用之前调用 /voicesynthesis/voices 终结点所返回的其中一个声音。Use one of the voices returned by your previous call to the /voicesynthesis/voices endpoint.

你将看到如下所示的输出:You'll see an output that looks like this:

Submit synthesis request successful

Checking status
NotStarted...

Checking status
Running...

Checking status
Running...

Checking status
Succeeded... Result file downloaded : xxxx.zip

结果包含服务生成的输入文本和音频输出文件。The result contains the input text and the audio output files that are generated by the service. 可以通过 zip 的形式下载这些文件。You can download these files in a zip.

备注

如果有多个输入文件,则需要提交多个请求。If you have more than 1 input files, you will need to submit multiple requests. 你需要注意以下限制。There are some limitations that needs to be aware.

  • 对于每个 Azure 订阅帐户,客户端每秒最多可以向服务器提交 5 个请求。The client is allowed to submit up to 5 requests to server per second for each Azure subscription account. 如果超出限制,客户端将收到 429 错误代码(请求过多)。If it exceeds the limitation, client will get a 429 error code(too many requests). 请减少每秒请求数Please reduce the request amount per second
  • 对于每个 Azure 订阅帐户,服务器最多可以运行 120 个请求并将其排入队列。The server is allowed to run and queue up to 120 requests for each Azure subscription account. 如果超出限制,服务器将返回 429 错误代码(请求过多)。If it exceeds the limitation, server will return a 429 error code(too many requests). 请耐心等待,避免在某些请求完成之前提交新请求Please wait and avoid submitting new request until some requests are completed

删除以前的请求Remove previous requests

该服务最多为每个 Azure 订阅帐户保留 20,000 个请求。The service will keep up to 20,000 requests for each Azure subscription account. 如果请求数量超出此限制,请在创建新请求之前删除以前的请求。If your request amount exceeds this limitation, please remove previous requests before making new ones. 如果不删除现有请求,则会收到错误通知。If you don't remove existing requests, you'll receive an error notification.

将以下代码添加到 voice_synthesis_client.pyAdd the following code to voice_synthesis_client.py:

parser.add_argument('--syntheses', action="store_true", default=False, help='print synthesis list')
parser.add_argument('--delete', action="store_true", default=False, help='delete a synthesis request')
parser.add_argument('-synthesisId', action="store", nargs='+', dest="synthesisId", help='the id of the voice synthesis which need to be deleted')

def getSubmittedSyntheses():
    response=requests.get(baseAddress+"voicesynthesis", headers={"Ocp-Apim-Subscription-Key":args.key}, verify=False)
    syntheses = json.loads(response.text)
    return syntheses

def deleteSynthesis(ids):
    for id in ids:
        print("delete voice synthesis %s " % id)
        response = requests.delete(baseAddress+"voicesynthesis/"+id, headers={"Ocp-Apim-Subscription-Key":args.key}, verify=False)
        if (response.status_code == 204):
            print("delete successful")
        else:
            print("delete failed, response.status_code: %d, response.text: %s " % (response.status_code, response.text))

if args.syntheses:
    synthese = getSubmittedSyntheses()
    print("There are %d synthesis requests submitted:" % len(synthese))
    for synthesis in synthese:
        print ("ID : %s , Name : %s, Status : %s " % (synthesis['id'], synthesis['name'], synthesis['status']))

if args.delete:
    deleteSynthesis(args.synthesisId)

运行 python voice_synthesis_client.py --syntheses -key <your_key> -region <region> 以获取所发出的合成请求的列表。Run python voice_synthesis_client.py --syntheses -key <your_key> -region <region> to get a list of synthesis requests that you've made. 你将看到如下所示的输出:You'll see an output like this:

There are <number> synthesis requests submitted:
ID : xxx , Name : xxx, Status : Succeeded
ID : xxx , Name : xxx, Status : Running
ID : xxx , Name : xxx : Succeeded

若要删除请求,请运行 python voice_synthesis_client.py --delete -key <your_key> -region <Region> -synthesisId <synthesis_id>,并将 <synthesis_id> 替换为从之前的请求返回的请求 ID 值。To delete a request, run python voice_synthesis_client.py --delete -key <your_key> -region <Region> -synthesisId <synthesis_id> and replace <synthesis_id> with a request ID value returned from the previous request.

备注

无法移除或删除状态为“正在运行”/“正在等待”的请求。Requests with a status of ‘Running’/'Waiting' cannot be removed or deleted.

GitHub 上提供了完整的 voice_synthesis_client.pyThe completed voice_synthesis_client.py is available on GitHub.

HTTP 状态代码HTTP status codes

下表详细介绍了 REST API 中的 HTTP 响应代码和消息。The following table details the HTTP response codes and messages from the REST API.

APIAPI HTTP 状态代码HTTP status code 说明Description 解决方案Solution
创建Create 400400 此区域未启用语音合成。The voice synthesis is not enabled in this region. 使用受支持区域更改语音订阅密钥。Change the speech subscription key with a supported region.
400400 只有此区域的标准语音订阅才有效。Only the Standard speech subscription for this region is valid. 将语音订阅密钥更改为“标准”定价层。Change the speech subscription key to the "Standard" pricing tier.
400400 超过 Azure 帐户的 20,000 个请求限制。Exceed the 20,000 request limit for the Azure account. 请在提交新请求之前删除一些请求。Please remove some requests before submitting new ones. 服务器将为每个 Azure 帐户最多保留 20,000 个请求。The server will keep up to 20,000 requests for each Azure account. 请在提交新请求之前删除一些请求。Delete some requests before submitting new ones.
400400 此模型不能用于语音合成 {modelID}。This model cannot be used in the voice synthesis : {modelID}. 请确保 {modelID} 的状态正确。Make sure the {modelID}'s state is correct.
400400 请求区域与模型区域 {modelID} 不匹配。The region for the request does not match the region for the model : {modelID}. 请确保 {modelID} 区域与请求区域匹配。Make sure the {modelID}'s region match with the request's region.
400400 语音合成仅支持使用包含字节顺序标记的 UTF-8 编码中的文本文件。The voice synthesis only supports the text file in the UTF-8 encoding with the byte-order marker. 请确保输入文件使用包含字节顺序标记的 UTF-8 编码。Make sure the input files are in UTF-8 encoding with the byte-order marker.
400400 语音合成请求中只允许有效的 SSML 输入。Only valid SSML inputs are allowed in the voice synthesis request. 请确保输入 SSML 表达式正确。Make sure the input SSML expressions are correct.
400400 在输入文件中找不到语音名称 {voiceName}。The voice name {voiceName} is not found in the input file. 输入 SSML 语音名称与模型 ID 不对齐。The input SSML voice name is not aligned with the model ID.
400400 输入文件中的段落数应小于 10,000。The number of paragraphs in the input file should be less than 10,000. 请确保文件中的段落数小于 10,000。Make sure the number of paragraphs in the file is less than 10,000.
400400 输入文件应超过 400 个字符。The input file should be more than 400 characters. 请确保输入文件超过 400 个字符。Make sure your input file exceeds 400 characters.
404404 找不到语音合成定义中声明的模型:{modelID}。The model declared in the voice synthesis definition cannot be found : {modelID}. 请确保 {modelID} 正确。Make sure the {modelID} is correct.
429429 超出活动语音合成限制。Exceed the active voice synthesis limit. 请等待直到一些请求完成。Please wait until some requests finish. 对于每个 Azure 帐户,服务器最多可以运行 120 个请求并将其排入队列。The server is allowed to run and queue up to 120 requests for each Azure account. 请等待并避免提交新请求,直到完成一些请求。Please wait and avoid submitting new requests until some requests are completed.
全部All 429429 请求太多。There are too many requests. 对于每个 Azure 帐户,客户端每秒最多可以向服务器提交 5 个请求。The client is allowed to submit up to 5 requests to server per second for each Azure account. 请减少每秒的请求数。Please reduce the request amount per second.
DeleteDelete 400400 语音合成任务仍在使用中。The voice synthesis task is still in use. 只能删除“已完成”或“已失败”的请求 。You can only delete requests that is Completed or Failed.
GetByIDGetByID 404404 找不到指定的实体。The specified entity cannot be found. 请确保合成 ID 正确。Make sure the synthesis ID is correct.

区域和终结点Regions and endpoints

长音频 API 可用于具有单独终结点的多个区域。The Long audio API is available in multiple regions with unique endpoints.

区域Region 终结点Endpoint
中国东部 2China East 2 https://chinaeast2.api.speech.azure.cn

音频输出格式Audio output formats

我们支持灵活的语音输入格式。We support flexible audio output formats. 可以通过设置“concatenateResult”参数,为每个段落生成音频输出或将音频输出串联到单个输出中。You can generate audio outputs per paragraph or concatenate the audio outputs into a single output by setting the 'concatenateResult' parameter. 长音频 API 支持以下音频输出格式:The following audio output formats are supported by the Long Audio API:

备注

默认音频格式为 riff-16khz-16bit-mono-pcm。The default audio format is riff-16khz-16bit-mono-pcm.

  • riff-8khz-16bit-mono-pcmriff-8khz-16bit-mono-pcm
  • riff-16khz-16bit-mono-pcmriff-16khz-16bit-mono-pcm
  • riff-24khz-16bit-mono-pcmriff-24khz-16bit-mono-pcm
  • riff-48khz-16bit-mono-pcmriff-48khz-16bit-mono-pcm
  • audio-16khz-32kbitrate-mono-mp3audio-16khz-32kbitrate-mono-mp3
  • audio-16khz-64kbitrate-mono-mp3audio-16khz-64kbitrate-mono-mp3
  • audio-16khz-128kbitrate-mono-mp3audio-16khz-128kbitrate-mono-mp3
  • audio-24khz-48kbitrate-mono-mp3audio-24khz-48kbitrate-mono-mp3
  • audio-24khz-96kbitrate-mono-mp3audio-24khz-96kbitrate-mono-mp3
  • audio-24khz-160kbitrate-mono-mp3audio-24khz-160kbitrate-mono-mp3

代码示例Sample code

GitHub 上提供了长语音 API 的示例代码。Sample code for Long Audio API is available on GitHub.