使用音频内容创建工具改进合成Improve synthesis with the Audio Content Creation tool

音频内容创建是一个在线工具,可用于为应用和产品自定义和微调 Microsoft 的文本到语音输出。Audio Content Creation is an online tool that allows you to customize and fine-tune Microsoft's text-to-speech output for your apps and products. 可使用此工具来微调公共和自定义语音,以获取更准确的自然表达式,并在云中管理输出。You can use this tool to fine-tune public and custom voices for more accurate natural expressions, and manage your output in the cloud.

音频内容创建工具基于语音合成标记语言 (SSML)The Audio Content Creation tool is based on Speech Synthesis Markup Language (SSML). 为了简化自定义和优化,音频内容创建使你能够实时可视化地检查文本到语音输出。To simplify customization and tuning, Audio Content Creation allows you to visually inspect your text-to-speech outputs in real time.

工作原理How does it work?

此图显示了微调文本到语音输出所需的步骤。This diagram shows the steps it takes to fine-tune text-to-speech outputs. 使用以下链接详细了解每个步骤。Use the links below to learn more about each step.

微调文本到语音输出所用步骤的图表。

  1. 首先设置 Azure 帐户和语音资源Set up your Azure account and Speech resource to get started.

  2. 使用纯文本或 SSML 脚本创建音频优化文件Create an audio tuning file using plain text or SSML scripts.

  3. 选择脚本内容的语音和语言。Choose the voice and the language for your script content. 音频内容创建包括所有 Microsoft 文本到语音转换Audio Content Creation includes all of the Microsoft text-to-speech voices. 可使用标准语音、神经语音或者你自己的自定义语音。You can use standard, neural, or your own custom voice.

    备注

    管控访问可用于自定义神经语音,这允许创建类似于自然语音的高清语音。Gated access is available for Custom Neural Voices, which allow you to create high-definition voices similar to natural-sounding speech. 有关详细信息,请参阅管控过程For additional details, see Gating process.

  4. 查看默认的合成输出。Review the default synthesis output. 然后通过调整发音、停顿、音调、速率、语调、语音风格等来改进输出。Then improve the output by adjusting pronunciation, break, pitch, rate, intonation, voice style, and more. 有关选项的完整列表,请参阅语音合成标记语言For a complete list of options, see Speech Synthesis Markup Language.

  5. 保存并导出优化音频Save and export your tuned audio. 在系统中保存优化音轨后,可继续工作并迭代输出。When you save the tuning track in the system, you can continue to work and iterate on the output. 如果对输出满意,可使用导出功能创建音频创建任务。When you're satisfied with the output, you can create an audio creation task with the export feature. 可查看导出任务的状态,并下载用于应用和产品的输出。You can observe the status of the export task, and download the output for use with your apps and products.

设置 Azure 帐户和语音资源Set up your Azure account and Speech resource

  1. 若要使用音频内容创建,必须具有 Azure 帐户。To work with Audio Content Creation, you must have an Azure account. 通过使用 Microsoft 帐户可创建 Azure 帐户。You can create an Azure account by using your Microsoft Account. 按照这些说明设置 Azure 帐户Follow these instructions to set up an Azure account.
  2. 为 Azure 帐户创建语音资源Create a Speech resource to your Azure account. 确保定价层设置为 S0Make sure that your pricing tier is set to S0. 如果正在使用其中一个神经语音,请确保在支持的区域中创建资源。If you are using one of the Neural voices, make sure that you create your resource in a supported region.
  3. 获取 Azure 帐户和语音资源后,可以使用语音服务并访问音频内容创建After you get the Azure account and the speech resource, you can use speech services and access Audio Content Creation.
  4. 选择需要处理的语音资源。Select the Speech resource you need to work on. 也可以在这里创建新的语音资源。You can also create a new Speech resource here.
  5. 可以随时使用位于顶部导航栏中的“设置”选项来修改语音资源。You can modify your Speech resource at any time with the Settings option, located in the top nav.

创建音频优化文件Create an audio tuning file

有两种方式可以将内容引入音频内容创建工具。There are two ways to get your content into the Audio Content Creation tool.

选项 1:Option 1:

  1. 单击“新建文件”以创建新的音频优化文件。Click New file to create a new audio tuning file.
  2. 在编辑窗口键入或粘贴内容。Type or paste your content into the editing window. 每个文件的字符数最多为 20,000 个。The characters for each file is up to 20,000. 如果脚本的长度超过 20,000 个字符,则可以使用选项 2 将内容自动拆分为多个文件。If your script is longer than 20,000 characters, you can use Option 2 to automatically split your content into multiple files.
  3. 切勿忘记保存。Don't forget to save.

选项 2:Option 2:

  1. 单击“上传”导入一个或多个文本文件。Click Upload to import one or more text files. 支持纯文本和 SSML。Both plain text and SSML are supported.

  2. 如果脚本文件超过 20,000 个字符,请按段落、字符或正则表达式拆分文件。If your script file is more than 20,000 characters, please split the file by paragraphs, by character or by regular expressions.

  3. 上传文本文件时,请确保该文件满足这些要求。When you upload your text files, make sure that the file meets these requirements.

    属性Property 值/注释Value / Notes
    文件格式File format 纯文本 (.txt)Plain text (.txt)
    SSML 文本 (.txt)SSML text (.txt)
    不支持 Zip 文件Zip files aren't supported
    编码格式Encoding format UTF-8UTF-8
    文件名File name 每个文件必须拥有唯一的名称。Each file must have a unique name. 不支持重复项。Duplicates aren't supported.
    文本长度Text length 文本长度不得超过 20,000 个字符。Text files must not exceed 20,000 characters.
    SSML 限制SSML restrictions 每个 SSML 文件只能包含一条 SSML。Each SSML file can only contain a single piece of SSML.

纯文本示例Plain text example

Welcome to use Audio Content Creation to customize audio output for your products.

SSML 文本示例SSML text example

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" version="1.0" xml:lang="en-US">
    <voice name="Microsoft Server Speech Text to Speech Voice (en-US, AriaNeural)">
    Welcome to use Audio Content Creation <break time="10ms" />to customize audio output for your products.
    </voice>
</speak>

导出优化音频Export tuned audio

查看音频输出并且对调整和优化都满意后,就可以导出音频。After you've reviewed your audio output and are satisfied with your tuning and adjustment, you can export the audio.

  1. 单击“导出”以创建音频创建任务。Click Export to create an audio creation task. 建议选择“导出到音频库”,因为它支持长音频输出和完整的音频输出体验。Export to Audio Library is recommended as it supports the long audio output and the full audio output experience. 还可直接将音频下载到本地磁盘,但只有前 10 分钟可用。You can also download the audio to your local disk directly, but only the first 10 minutes are available.
  2. 选择优化音频的输出格式。Choose the output format for your tuned audio. 下面提供了支持格式和采样率的列表。A list of supported formats and sample rates is available below.
  3. 可在“导出任务”选项卡上查看任务的状态。如果任务失败,请参阅详细信息页获取完整的报表。You can view the status of the task on the Export task tab. If the task fails, see the detailed information page for a full report.
  4. 完成该任务后,可以在“音频库”选项卡上下载音频。When the task is complete, your audio is available for download on the Audio Library tab.
  5. 单击“下载”。Click Download. 现在,你可以在你的应用或产品中使用自定义的优化音频。Now you're ready to use your custom tuned audio in your apps or products.

支持的音频格式Supported audio formats

格式Format 16 kHz 采样率16 kHz sample rate 24 kHz 采样率24 kHz sample rate
wavwav riff-16khz-16bit-mono-pcmriff-16khz-16bit-mono-pcm riff-24khz-16bit-mono-pcmriff-24khz-16bit-mono-pcm
mp3mp3 audio-16khz-128kbitrate-mono-mp3audio-16khz-128kbitrate-mono-mp3 audio-24khz-160kbitrate-mono-mp3audio-24khz-160kbitrate-mono-mp3

另请参阅See also

后续步骤Next steps