关于语音 SDKAbout the Speech SDK

语音软件开发工具包 (SDK) 公开了许多语音服务功能,这使得你能够开发支持语音的应用程序。The Speech software development kit (SDK) exposes many of the Speech service capabilities, to empower you to develop speech-enabled applications. 语音 SDK 可以在许多编程语言中和所有平台中使用。The Speech SDK is available in many programming languages and across all platforms.

编程语言Programming language 平台Platform SDK 参考SDK reference
C# 1C# 1 Windows、Linux、macOS、Mono、Xamarin.iOS、Xamarin.Mac、Xamarin.Android、UWP、UnityWindows, Linux, macOS, Mono, Xamarin.iOS, Xamarin.Mac, Xamarin.Android, UWP, Unity .NET SDK.NET SDK
C++C++ Windows、Linux、macOSWindows, Linux, macOS C++ SDKC++ SDK
Java 2Java 2 Android、Windows、Linux、macOSAndroid, Windows, Linux, macOS Java SDKJava SDK
JavascriptJavaScript Browser、Node.jsBrowser, Node.js JavaScript SDKJavaScript SDK
Objective-C/SwiftObjective-C / Swift iOS、macOSiOS, macOS Objective-C SDKObjective-C SDK
PythonPython Windows、Linux、macOSWindows, Linux, macOS Python SDKPython SDK

1 .NET 语音 SDK 基于 .NET Standard 2.0,因此它支持很多平台。有关详细信息,请参阅 .NET 实现支持1 The .NET Speech SDK is based on .NET Standard 2.0, thus it supports many platforms. For more information, see .NET implementation support .

2 Java 语音 SDK 也作为语音设备 SDK 的一部分提供。2 The Java Speech SDK is also available as part of the Speech Devices SDK.

方案功能Scenario capabilities

语音 SDK 公开了语音服务中的许多功能,但未公开全部功能。The Speech SDK exposes many features from the Speech service, but not all of them. 语音 SDK 的功能通常与方案相关联。The capabilities of the Speech SDK are often associated with scenarios. 语音 SDK 同时适用于实时和非实时方案,使用本地设备、文件、Azure Blob 存储甚至输入和输出流。The Speech SDK is ideal for both real-time and non-real-time scenarios, using local devices, files, Azure blob storage, and even input and output streams. 如果无法通过语音 SDK 实现某个方案,请寻求使用 REST API 替代方法。When a scenario is not achievable with the Speech SDK, look for a REST API alternative.

语音转文本Speech-to-text

语音转文本(也称为“语音识别”**)可将音频流听录为应用程序、工具或设备可以使用或显示的文本。Speech-to-text (also known as speech recognition) transcribes audio streams to text that your applications, tools, or devices can consume or display. 结合语言理解 (LUIS) 使用语音转文本可以从听录的语音中派生用户意向,以及处理语音命令。Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. 使用语音翻译通过单个调用将语音输入翻译为另一种语言。Use Speech Translation to translate speech input to a different language with a single call. 有关详细信息,请参阅语音转文本基础知识For more information, see Speech-to-text basics.

文本转语音Text-to-speech

文本转语音(也称为“语音合成”**)将文本转换为类似人类语言的合成语音。Text-to-speech (also known as speech synthesis) converts text into human-like synthesized speech. 输入文本是字符串文字或使用语音合成标记语言 (SSML)The input text is either string literals or using the Speech Synthesis Markup Language (SSML). 有关标准语音或神经语音的详细信息,请参阅文本转语音语言和语音支持For more information on standard or neural voices, see Text-to-speech language and voice support.

编解码器压缩的音频输入Codec compressed audio input

一些语音 SDK 编程语言支持编解码器压缩的音频输入流。Several of the Speech SDK programming languages support codec compressed audio input streams. 有关详细信息,请参阅使用压缩的音频输入格式 For more information, see use compressed audio input formats .

REST APIREST API

虽然语音 SDK 涵盖了语音服务的许多功能,但对于某些方案,你可能需要使用 REST API。While the Speech SDK covers many feature capabilities of the Speech Service, for some scenarios you might want to use the REST API.

批量听录Batch transcription

使用批量听录能够以异步方式对大量的数据进行语音转文本听录。Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. 只能通过 REST API 使用批量听录。Batch transcription is only possible from the REST API. 除了将语音音频转换为文本,批量语音转文本还允许进行分割聚类和情感分析。In addition to converting speech audio to text, batch speech-to-text also allows for diarization and sentiment-analysis.

自定义Customization

语音服务在语音转文本、文本转语音和语音翻译方面提供了强大的功能和默认模型。The Speech Service delivers great functionality with its default models across speech-to-text, text-to-speech, and speech-translation. 有时,你可能希望提高基线性能,以便更好地处理你的独特用例。Sometimes you may want to increase the baseline performance to work even better with your unique use case. 语音服务有各种各样的无代码自定义工具,这些工具使上述事项变得简单,并使你能够使用基于你自己的数据的自定义模型获得竞争优势。The Speech Service has a variety of no-code customization tools that make it easy, and allow you to create a competitive advantage with custom models based on your own data. 这些模型将仅供你和你的组织使用。These models will only be available to you and your organization.

自定义语音转文本Custom Speech-to-text

使用语音转文本在独特的环境中进行识别和听录时,可以创建并训练自定义的声学、语言和发音模型,以解决环境干扰或行业特定的词汇的问题。When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. 可通过自定义语音识别门户来创建和管理无代码自定义语音识别模型。The creation and management of no-code Custom Speech models is available through the Custom Speech Portal. 自定义语音识别模型在发布后可以由语音 SDK 使用。Once the Custom Speech model is published, it can be consumed by the Speech SDK.

获取语音 SDKGet the Speech SDK

重要

需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

语音 SDK 支持 Windows 10 和 Windows Server 2016 或更高版本。The Speech SDK supports Windows 10 and Windows Server 2016, or later versions. 以前的版本不受官方支持 。Earlier versions are not officially supported. 部分语音 SDK 可以在早期版本的 Windows 中使用,但不建议这样做。It is possible to use parts of the Speech SDK with earlier versions of Windows, although it's not advised.


Windows

系统要求System requirements

Windows 版语音 SDK 要求系统上安装有 Microsoft Visual C++ Redistributable for Visual Studio 2019 The Speech SDK on Windows requires the Microsoft Visual C++ Redistributable for Visual Studio 2019 on the system.

C#C#

.NET 语音 SDK 以 NuGet 包的形式提供并实现了 .NET Standard 2.0。有关详细信息,请参阅 Microsoft.CognitiveServices.Speech The .NET Speech SDK is available as a NuGet package and implements .NET Standard 2.0, for more information, see Microsoft.CognitiveServices.Speech .


C#

C# NuGet 包C# NuGet Package

可以使用以下 dotnet add 命令从 .NET Core CLI 安装 .NET 语音 SDK。The .NET Speech SDK can be installed from the .NET Core CLI with the following dotnet add command.

dotnet add package Microsoft.CognitiveServices.Speech

可以使用以下 Install-Package 命令从包管理器 安装 .NET 语音 SDK。The .NET Speech SDK can be installed from the Package Manager with the following Install-Package command.

Install-Package Microsoft.CognitiveServices.Speech

其他资源Additional resources

对于麦克风输入,必须安装媒体基础库。For microphone input, the Media Foundation libraries must be installed. 这些库包含在 Windows 10 和 Windows Server 2016 中。These libraries are part of Windows 10 and Windows Server 2016. 只要未将麦克风用作音频输入设备,则可在没有这些库的情况下使用语音 SDK。It's possible to use the Speech SDK without these libraries, as long as a microphone isn't used as the audio input device.

所需语音 SDK 文件可部署在与应用程序相同的目录中。The required Speech SDK files can be deployed in the same directory as your application. 这样,应用程序便可直接访问库。This way your application can directly access the libraries. 请确保选择与应用程序匹配的正确版本 (x86/x64)。Make sure you select the correct version (x86/x64) that matches your application.

名称Name 函数Function
Microsoft.CognitiveServices.Speech.core.dll 核心 SDK,对于本机和托管部署是必需的Core SDK, required for native and managed deployment
Microsoft.CognitiveServices.Speech.csharp.dll 对于托管部署是必需的Required for managed deployment

备注

从版本 1.3.0 开始,不再需要 Microsoft.CognitiveServices.Speech.csharp.bindings.dll 文件(在以前的版本中提供)。Starting with the release 1.3.0 the file Microsoft.CognitiveServices.Speech.csharp.bindings.dll (shipped in previous releases) isn't needed anymore. 此功能现在集成到核心 SDK 中。The functionality is now integrated in the core SDK.

重要

对于 Windows 窗体应用 (.NET Framework) C# 项目,请确保项目的部署设置中包含这些库。For the Windows Forms App (.NET Framework) C# project, make sure the libraries are included in your project's deployment settings. 你可以在 Properties -> Publish Section 下查看此内容。You can check this under Properties -> Publish Section. 单击 Application Files 按钮并从向下滚动列表中查找相应的库。Click the Application Files button and find corresponding libraries from the scroll down list. 请确保将值设置为 IncludedMake sure the value is set to Included. Visual Studio 将在发布/部署项目时包含该文件。Visual Studio will include the file when project is published/deployed.

C++C++

C++ 语音 SDK 在 Windows、Linux 和 macOS 上可用。The C++ Speech SDK is available on Windows, Linux, and macOS. 有关详细信息,请参阅 Microsoft.CognitiveServices.Speech For more information, see Microsoft.CognitiveServices.Speech .


C++

C++ NuGet 包C++ NuGet package

可以使用以下 Install-Package 命令从包管理器 安装 C++ 语音 SDK。The C++ Speech SDK can be installed from the Package Manager with the following Install-Package command.

Install-Package Microsoft.CognitiveServices.Speech

C++ 二进制文件和头文件C++ binaries and header files

也可以从二进制文件安装 C++ 语音 SDK。Alternatively, the C++ Speech SDK can be installed from binaries. 将 SDK 下载为 .tar 包 ,并将文件解压缩到所选的一个目录中。Download the SDK as a .tar package and unpack the files in a directory of your choice. 此包的内容(包括 x86 和 x64 目标体系结构的头文件)的结构如下所示:The contents of this package (which include header files for both x86 and x64 target architectures) are structured as follows:

PathPath 说明Description
license.md 许可License
ThirdPartyNotices.md 第三方声明Third-party notices
include 用于 C++ 的头文件Header files for C++
lib/x64 用于与应用程序链接的本机 x64 库Native x64 library for linking with your application
lib/x86 用于与应用程序链接的本机 x86 库Native x86 library for linking with your application

要创建应用程序,请将必需的二进制文件(以及库)复制到开发环境中。To create an application, copy or move the required binaries (and libraries) into your development environment. 在生成过程中根据需要添加它们。Include them as required in your build process.

其他资源Additional resources

PythonPython

Python 语音 SDK 以 Python 包索引 (PyPI) 模块的形式提供。有关详细信息,请参阅 azure-cognitiveservices-speech The Python Speech SDK is available as a Python Package Index (PyPI) module, for more information, see azure-cognitiveservices-speech . Python 语音 SDK 与 Windows、Linux 和 macOS 兼容。The Python Speech SDK is compatible with Windows, Linux, and macOS.


Python
pip install azure-cognitiveservices-speech

提示

如果在 macOS 上操作,可能需要运行以下命令才能让上述 pip 命令生效:If you are on macOS, you may need to run the following command to get the pip command above to work:

python3 -m pip install --upgrade pip

其他资源Additional resources

JavaJava

Java SDK for Android 打包为 AAR(Android 库),其中包括必要的库以及所需的 Android 权限。The Java SDK for Android is packaged as an AAR (Android Library) , which includes the necessary libraries and required Android permissions. 它作为包 com.microsoft.cognitiveservices.speech:client-sdk:1.12.1 托管在 https://csspeechstorage.blob.core.windows.net/maven/ 的 Maven 存储库中。It's hosted in a Maven repository at https://csspeechstorage.blob.core.windows.net/maven/ as package com.microsoft.cognitiveservices.speech:client-sdk:1.12.1.


Java

若要从你的 Android Studio 项目中使用该包,请进行以下更改:To consume the package from your Android Studio project, make the following changes:

  1. 在项目级 build.gradle 文件中,向 repository 部分添加以下内容:In the project-level build.gradle file, add the following to the repository section:
maven { url 'https://csspeechstorage.blob.core.windows.net/maven/' }
  1. 在模块级 build.gradle 文件中,向 dependencies 部分添加以下内容:In the module-level build.gradle file, add the following to the dependencies section:
implementation 'com.microsoft.cognitiveservices.speech:client-sdk:1.12.1'

Java SDK 也是语音设备 SDK 的一部分。The Java SDK is also part of the Speech Devices SDK.

其他资源Additional resources

重要

下载任何 Azure 认知服务语音 SDK,即表示你已确认接受其许可条款。By downloading any of the Azure Cognitive Services Speech SDKs, you acknowledge its license. 有关详细信息,请参阅:For more information, see:

示例源代码Sample source code

语音 SDK 在一个开源存储库中积极维护大量的示例。The Speech SDK actively maintains a large set of examples in an open-source repository. 有关示例源代码存储库,请访问 GitHub 上的 Microsoft 认知服务语音 SDKFor the sample source code repository, visit the Microsoft Cognitive Services Speech SDK on GitHub . 其中有适用于 C#、C++、Java、Python、Objective-C、Swift、JavaScript、UWP、Unity 和 Xamarin 的示例。There are samples for C#, C++, Java, Python, Objective-C, Swift, JavaScript, UWP, Unity, and Xamarin.


GitHub

后续步骤Next steps