了解语音 CLI 的基础知识Learn the basics of the Speech CLI

本文介绍了语音 CLI 的基本用法模式,这是一种无需编写代码即可使用语音服务的命令行工具。In this article, you learn the basic usage patterns of the Speech CLI, a command line tool to use the Speech service without writing code. 无需创建开发环境或编写任何代码,你可以快速测试语音服务的主要功能,以了解它能否充分满足你的用例的要求。You can quickly test out the main features of the Speech service, without creating development environments or writing any code, to see if your use-cases can be adequately met. 此外,语音 CLI 随时可投入生产,可用通过 .bat 或 shell 脚本,使用它自动化语音服务中的简单工作流。Additionally, the Speech CLI is production ready and can be used to automate simple workflows in the Speech service, using .bat or shell scripts.

下载并安装Download and install


在 Windows 上,需要安装适用于平台的 Microsoft Visual C++ Redistributable for Visual Studio 2019On Windows, you need the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. 首次安装时,可能需要重启 Windows。Installing this for the first time may require you to restart Windows.

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  2. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.


在 Windows 上,语音 CLI 只能显示本地计算机上命令提示符适用的字体。On Windows, the Speech CLI can only show fonts available to the command prompt on the local computer. Windows 终端支持通过语音 CLI 以交互方式生成的所有字体。Windows Terminal supports all fonts produced interactively by the Speech CLI. 如果输出到文件,文本编辑器(例如记事本)或 web 浏览器(例如 Microsoft Edge)也可以显示所有字体。If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.


查找命令时,Powershell 不会检查本地目录。Powershell does not check the local directory when looking for a command. 在 Powershell 中,将目录更改为 spx 的位置,并通过输入 .\spx 调用工具。In Powershell, change directory to the location of spx and call the tool by entering .\spx. 如果将此目录添加到路径,则 Powershell 和 Windows 命令提示符会从不包含 .\ 前缀的任何目录中查找 spxIf you add this directory to your path, Powershell and the Windows command prompt will find spx from any directory without including the .\ prefix.

创建订阅配置Create subscription config

若要开始使用语音 CLI,需要输入语音订阅密钥和区域标识符。To start using the Speech CLI, you need to enter your Speech subscription key and region identifier. 按照免费试用语音服务中的步骤获取这些凭据。Get these credentials by following steps in Try the Speech service for free. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. chinaeast2),运行以下命令。chinaeast2), run the following commands.

spx config @key --set SUBSCRIPTION-KEY
spx config @region --set REGION

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

基本用法Basic usage

本部分介绍了一些基本的 SPX 命令,这些命令对于首次测试和试验通常非常有用。This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. 首先,运行以下命令,查看内置于该工具的帮助。Start by viewing the help built in to the tool by running the following command.


注意,请参阅命令参数右侧列出的帮助主题。Notice see: help topics listed right of command parameters. 可输入这些命令来获取有关子命令的详细帮助。You can enter these commands to get detailed help about sub-commands.

可按关键字搜索帮助主题。You can search help topics by keyword. 例如,输入以下命令,查看语音 CLI 用法示例列表:For example, enter the following command to see a list of Speech CLI usage examples:

spx help find --topics "examples"

输入以下命令,查看用于识别命令的选项:Enter the following command to see options for the recognize command:

spx help recognize

现在,通过运行以下命令,使用语音服务通过默认麦克风执行一些语音识别。Now use the Speech service to perform some speech recognition using your default microphone by running the following command.

spx recognize --microphone

SPX 将在输入命令后开始侦听当前活动输入设备上的音频,并在你按下 ENTER 后停止。After entering the command, SPX will begin listening for audio on the current active input device, and stop after you press ENTER. 然后,识别所录制的语音,并将其转换为控制台输出中的文本。The recorded speech is then recognized and converted to text in the console output. 使用语音 CLI,还可以轻松地进行文本转语音合成。Text-to-speech synthesis is also easy to do using the Speech CLI.

运行以下命令会将已输入的文本作为输入,并将合成的语音输出到当前活动的输出设备。Running the following command will take the entered text as input, and output the synthesized speech to the current active output device.

spx synthesize --text "Testing synthesis using the Speech CLI" --speakers

除了语音识别和合成外,还可以通过语音 CLI 进行语音翻译。In addition to speech recognition and synthesis, you can also do speech translation with the Speech CLI. 与上面的语音识别命令类似,运行以下命令以从默认麦克风中捕获音频,并在目标语言中将其转换为文本。Similar to the speech recognition command above, run the following command to capture audio from your default microphone, and perform translation to text in the target language.

spx translate --microphone --source en-US --target ru-RU --output file C:\some\file\path\russian_translation.txt

在此命令中,你既指定了源语言(要翻译的语言),又指定了目标语言(翻译成的语言) 。In this command, you specify both the source (language to translate from), and the target (language to translate to) languages. 使用 --microphone 参数将侦听当前活动输入设备上的音频,并在你按 ENTER 后停止。Using the --microphone argument will listen to audio on the current active input device, and stop after you press ENTER. 输出即将文本翻译为目标语言,写入到文本文件。The output is a text translation to the target language, written to a text file.


有关所有受支持的语言及其相应的区域设置代码列表,请参阅语言和区域设置文章See the language and locale article for a list of all supported languages with their corresponding locale codes.

数据存储中的配置文件Configuration files in the datastore

语音 CLI 的行为可依赖于配置文件中的设置,可以使用 @ 符号在语音 CLI 调用中引用这些设置。Speech CLI's behavior can rely on settings in configuration files, which you can refer to within Speech CLI calls using a @ symbol. 语音 CLI 在当前工作目录下它创建的新 ./spx/data 子目录中保存新设置。Speech CLI saves a new setting in a new ./spx/data subdirectory it creates in the current working directory. 查找配置值时,语音 CLI 将在当前工作目录中查找,再在 ./spx/data 的数据存储中查找,然后在其他数据存储(包括 spx 二进制文件中的最终只读数据存储)中查找。When seeking a configuration value, Speech CLI looks in your current working directory, then in the datastore at ./spx/data, and then in other datastores, including a final read-only datastore in the spx binary. 以前,你使用了数据存储来保存 @key@region 值,因此无需通过每个命令行调用来指定它们。Previously, you used the datastore to save your @key and @region values, so you did not need to specify them with each command line call. 你还可以使用配置文件来存储你自己的配置设置,甚至使用它们来传递 URL 或在运行时生成的其他动态内容。You can also use configuration files to store your own configuration settings, or even use them to pass URLs or other dynamic content generated at runtime.

本部分介绍了使用本地数据存储中的配置文件借助 spx config 来存储和提取命令设置,并使用 --output 选项存储语音 CLI 的输出。This section shows use of a configuration file in the local datastore to store and fetch command settings using spx config, and store output from Speech CLI using the --output option.

下面的示例将清除 @my.defaults 配置文件,为文件中的“键”和“区域”添加键值对,并在调用 spx recognize 时使用此配置 。The following example clears the @my.defaults configuration file, adds key-value pairs for key and region in the file, and uses the configuration in a call to spx recognize.

spx config @my.defaults --clear
spx config @my.defaults --add key 000072626F6E20697320636F6F6C0000
spx config @my.defaults --add region chinaeast2

spx config @my.defaults

spx recognize --nodefaults @my.defaults --file hello.wav

你还可以向配置文件写入动态内容。You can also write dynamic content to a configuration file. 例如,以下命令将创建一个自定义语音模型,并在配置文件中存储新模型的 URL。For example, the following command creates a custom speech model and stores the URL of the new model in a configuration file. 下一条命令要等到该 URL 的模型可以使用时才返回。The next command waits until the model at that URL is ready for use before returning.

spx csr model create --name "Example 4" --datasets @my.datasets.txt --output url @my.model.txt
spx csr model status --model @my.model.txt --wait

以下示例将两条 URL 写入 @my.datasets.txt 配置文件。The following example writes two URLs to the @my.datasets.txt configuration file. 在此方案中,--output 可以包括一个可选“添加”关键字,以创建配置文件或追加到现有配置文件。In this scenario, --output can include an optional add keyword to create a configuration file or append to the existing one.

spx csr dataset create --name "LM" --kind Language --content https://crbn.us/data.txt --output url @my.datasets.txt
spx csr dataset create --name "AM" --kind Acoustic --content https://crbn.us/audio.zip --output add url @my.datasets.txt

spx config @my.datasets.txt

有关数据存储文件的详细信息,包括使用默认配置文件(用于命令特定默认设置的 @spx.default@default.config@*.default.config),请输入以下命令:For more details about datastore files, including use of default configuration files (@spx.default, @default.config, and @*.default.config for command-specific default settings), enter this command:

spx help advanced setup

批处理操作Batch operations

上一部分中的命令非常适合用于快速查看语音服务的工作方式。The commands in the previous section are great for quickly seeing how the Speech service works. 在评估是否可以满足用例时,你可能需要对已有的输入范围执行批处理操作,以查看服务如何处理各种情况。However, when assessing whether or not your use-cases can be met, you likely need to perform batch operations against a range of input you already have, to see how the service handles a variety of scenarios. 本节介绍如何完成下列操作:This section shows how to:

  • 在音频文件目录上运行批处理语音识别Run batch speech recognition on a directory of audio files
  • 循环访问 .tsv 文件并运行批处理文本转语音合成Iterate through a .tsv file and run batch text-to-speech synthesis

批处理语音识别Batch speech recognition

如果有音频文件的目录,则通过语音 CLI 可以轻松地快速运行批处理语音识别。If you have a directory of audio files, it's easy with the Speech CLI to quickly run batch-speech recognition. 只需运行以下命令,即可使用 --files 命令指向目录。Simply run the following command, pointing to your directory with the --files command. 在此示例中,将 \*.wav 追加到目录,以识别目录中存在的所有 .wav 文件。In this example, you append \*.wav to the directory to recognize all .wav files present in the dir. 此外,指定 --threads 参数以在 10 个并行线程上运行识别。Additionally, specify the --threads argument to run the recognition on 10 parallel threads.


--threads 参数也可以在下一部分中用于 spx synthesize 命令,可用线程将取决于 CPU 及其当前负载百分比。The --threads argument can be also used in the next section for spx synthesize commands, and the available threads will depend on the CPU and its current load percentage.

spx recognize --files C:\your_wav_file_dir\*.wav --output file C:\output_dir\speech_output.tsv --threads 10

使用 --output file 参数将识别的语音输出写入到 speech_output.tsvThe recognized speech output is written to speech_output.tsv using the --output file argument. 下面是输出文件结构的示例。The following is an example of the output file structure.

audio.input.id    recognizer.session.started.sessionid    recognizer.recognized.result.text
sample_1    07baa2f8d9fd4fbcb9faea451ce05475    A sample wave file.
sample_2    8f9b378f6d0b42f99522f1173492f013    Sample text synthesized.

将语音合成到文件中Synthesize speech to a file

运行以下命令,将扬声器的输出更改为 .wav 文件。Run the following command to change the output from your speaker to a .wav file.

spx synthesize --text "The speech synthesizer greets you!" --audio output greetings.wav

语音 CLI 将采用英文向 greetings.wav 音频文件生成自然语言。The Speech CLI will produce natural language in English into the greetings.wav audio file. 在 Windows 中,输入 start greetings.wav 可以播放音频文件。In Windows, you can play the audio file by entering start greetings.wav.

批处理文本转语音合成Batch text-to-speech synthesis

运行批处理文本转语音的最简单方法是创建一个新的 .tsv(制表符分隔值)文件,并利用语音 CLI 中的 --foreach 命令。The easiest way to run batch text-to-speech is to create a new .tsv (tab-separated-value) file, and leverage the --foreach command in the Speech CLI. 请考虑以下文件 text_synthesis.tsvConsider the following file text_synthesis.tsv:

audio.output    text
C:\batch_wav_output\wav_1.wav    Sample text to synthesize.
C:\batch_wav_output\wav_2.wav    Using the Speech CLI to run batch-synthesis.
C:\batch_wav_output\wav_3.wav    Some more text to test capabilities.

接下来,运行命令以指向 text_synthesis.tsv,对每个 text 字段执行合成,然后将结果作为 .wav 文件写入相应的 audio.output 路径中。Next, you run a command to point to text_synthesis.tsv, perform synthesis on each text field, and write the result to the corresponding audio.output path as a .wav file.

spx synthesize --foreach in @C:\your\path\to\text_synthesis.tsv

此命令等效于对 .tsv 文件中的每个记录运行 spx synthesize --text Sample text to synthesize --audio output C:\batch_wav_output\wav_1.wavThis command is the equivalent of running spx synthesize --text Sample text to synthesize --audio output C:\batch_wav_output\wav_1.wav for each record in the .tsv file. 请注意以下几点:A couple things to note:

  • 列标题 audio.outputtext 分别对应于命令行参数 --audio output--textThe column headers, audio.output and text, correspond to the command line arguments --audio output and --text, respectively. 多部分命令行参数(如 --audio output)应在文件中格式化,无空格,无前导短划线,使用句点分隔字符串,例如 audio.outputMulti-part command line arguments like --audio output should be formatted in the file with no spaces, no leading dashes, and periods separating strings, e.g. audio.output. 使用此模式,可以将任何其他现有命令行参数作为其他列添加到文件中。Any other existing command line arguments can be added to the file as additional columns using this pattern.
  • 以这种方式格式化文件时,不需要将其他参数传递给 --foreachWhen the file is formatted in this way, no additional arguments are required to be passed to --foreach.
  • 请确保通过选项卡将 .tsv 中的每个值分隔开。Ensure to separate each value in the .tsv with a tab.

但是,如果你具有如下面示例的 .tsv 文件(其列标头不匹配命令行参数):However, if you have a .tsv file like the following example, with column headers that do not match command line arguments:

wav_path    str_text
C:\batch_wav_output\wav_1.wav    Sample text to synthesize.
C:\batch_wav_output\wav_2.wav    Using the Speech CLI to run batch-synthesis.
C:\batch_wav_output\wav_3.wav    Some more text to test capabilities.

可以在 --foreach 调用中使用以下语法将这些字段名称替代为正确的参数。You can override these field names to the correct arguments using the following syntax in the --foreach call. 此调用与上面相同。This is the same call as above.

spx synthesize --foreach audio.output;text in @C:\your\path\to\text_synthesis.tsv

后续步骤Next steps