什么是语音服务?What is the Speech service?

语音服务在单个 Azure 订阅中统合了语音转文本、文本转语音以及语音翻译功能。The Speech service is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. 使用语音 SDK语音设备 SDKREST API 可以轻松在应用程序、工具和设备中启用语音。It's easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.

这些功能构成了语音服务。These features make up the Speech service. 请使用下表中的链接详细了解每项功能的常见用例或浏览 API 参考信息。Use the links in this table to learn more about common use cases for each feature or browse the API reference.

服务Service 功能Feature 说明Description SDKSDK RESTREST
语音转文本Speech-to-Text 实时语音转文本Realtime Speech-to-text 语音转文本可将音频流或本地文件实时转录或翻译为文本,应用程序、工具或设备可以使用或显示这些文本。Speech-to-text transcribes or translates audio streams or local files to text in realtime that your applications, tools, or devices can consume or display. 结合语言理解 (LUIS) 使用语音转文本可以从听录的语音中派生用户意向,以及处理语音命令。Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. Yes Yes
批量语音转文本Batch Speech-to-Text 批量语音转文本支持对 Azure Blob 存储中存储的大量语音音频数据进行异步语音到文本转录。Batch Speech-to-text enables asynchronous speech-to-text transcription of large volumes of speech audio data stored in Azure Blob Storage. 除了将语音音频转换为文本,批量语音转文本还允许进行分割聚类和情感分析。In addition to converting speech audio to text, Batch Speech-to-text also allows for diarization and sentiment-analysis. No Yes
创建自定义语音识别模型Create Custom Speech Models 如果使用语音转文本在独特的环境中进行识别和听录,则可以创建并训练自定义的声学、语言和发音模型,以解决环境干扰或行业特定的词汇。If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. No Yes
文本转语音Text-to-Speech 文本转语音Text-to-speech 文本转语音可使用语音合成标记语言 (SSML) 将输入文本转换为类似人类的合成语音。Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). 可以选择标准语音或神经语音(请参阅语言支持)。Choose from standard voices and neural voices (see Language support). Yes Yes
语音翻译Speech Translation 语音翻译Speech translation 使用语音翻译可在应用程序、工具和设备中实现实时的多语言语音翻译。Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. 进行语音转语音和语音转文本翻译时可以使用此服务。Use this service for speech-to-speech and speech-to-text translation. Yes No

重要

现在,将对此服务的所有 HTTP 请求强制执行 TLS 1.2。TLS 1.2 is now enforced for all HTTP requests to this service.

试用语音服务Try the Speech service

我们提供了适用于大多数流行编程语言的快速入门,旨在帮助你在 10 分钟以内运行代码。We offer quickstarts in most popular programming languages, each designed to have you running code in less than 10 minutes. 下表包含有关每项功能在最流行编程语言中的用法的快速入门。This table contains the most popular quickstarts for each feature. 使用左侧的导航栏可以浏览其他语言和平台。Use the left-hand navigation to explore additional languages and platforms.

语音转文本 (SDK)Speech-to-text (SDK) 文本转语音 (SDK)Text-to-Speech (SDK) 翻译 (SDK)Translation (SDK)
识别来自音频文件的语音Recognize speech from an audio file 将语音合成为音频文件Synthesize speech into an audio file 将语音转换为文本Translate speech to text
使用麦克风识别语音Recognize speech with a microphone 将语音合成到扬声器Synthesize speech to a speaker 将语音翻译为多种目标语言Translate speech to multiple target languages
识别存储在 Blob 存储中的语音Recognize speech stored in blob storage 将语音转换为语音Translate speech-to-speech

备注

“语音转文本”和“文本转语音”功能也有 REST 终结点和相关联的快速入门。Speech-to-text and text-to-speech also have REST endpoints and associated quickstarts.

获取示例代码Get sample code

重要

需要语音 SDK 版本 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

GitHub 上提供了语音服务的示例代码。Sample code is available on GitHub for the Speech service. 这些示例涵盖了常见方案,例如,从文件或流中读取音频、连续和单次识别,以及使用自定义模型。These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. 使用以下链接查看 SDK 和 REST 示例:Use these links to view SDK and REST samples:

自定义语音体验Customize your speech experience

语音服务能够很好地与内置模型配合工作,但是,你可能想要根据自己的产品或环境,进一步自定义和优化体验。The Speech service works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. 自定义选项的范围从声学模型优化,到专属于自有品牌的语音字体。Customization options range from acoustic model tuning to unique voice fonts for your brand.

语音服务Speech Service 平台Platform 说明Description
语音转文本Speech-to-Text 自定义语音识别Custom Speech 根据需要和可用数据自定义语音识别模型。Customize speech recognition models to your needs and available data. 克服语音识别障碍,如说话风格、词汇和背景噪音。Overcome speech recognition barriers such as speaking style, vocabulary and background noise.

参考文档Reference docs

后续步骤Next steps