了解语音 CLI 的基础知识Learn the basics of the Speech CLI

本文介绍了语音 CLI 的基本用法模式,这是一种无需编写代码即可使用语音服务的命令行工具。In this article, you learn the basic usage patterns of the Speech CLI, a command line tool to use the Speech service without writing code. 无需创建开发环境或编写任何代码,你可以快速测试语音服务的主要功能,以了解它能否充分满足你的用例的要求。You can quickly test out the main features of the Speech service, without creating development environments or writing any code, to see if your use-cases can be adequately met. 此外,语音 CLI 随时可投入生产,可用通过 .bat 或 shell 脚本,使用它自动化语音服务中的简单工作流。Additionally, the Speech CLI is production ready and can be used to automate simple workflows in the Speech service, using .bat or shell scripts.

先决条件Prerequisites

唯一先决条件是要有一个 Azure 语音订阅。The only prerequisite is an Azure Speech subscription. 如果还没有订阅,请参阅指南了解如何新建订阅。See the guide on creating a new subscription if you don't already have one.

下载并安装Download and install

按照以下步骤在 Windows 上安装语音 CLI:Follow these steps to install the Speech CLI on Windows:

  1. 安装 .NET Framework 4.7.NET Core 3.0Install either .NET Framework 4.7 or .NET Core 3.0
  2. 下载语音 CLI zip 存档然后提取它。Download the Speech CLI zip archive, then extract it.
  3. 转到从下载中提取的根目录 spx-zips,并提取所需的子目录(spx-net471 用于 .NET Framework 4.7,spx-netcore-win-x64 用于 x64 CPU 上的 .NET Core 3.0)。Go to the root directory spx-zips that you extracted from the download, and extract the subdirectory that you need (spx-net471 for .NET Framework 4.7, or spx-netcore-win-x64 for .NET Core 3.0 on an x64 CPU).

在命令提示符中,将目录更改到此位置,然后键入 spx 查看语音 CLI 的帮助。In the command prompt, change directory to this location, and then type spx to see help for the Speech CLI.

创建订阅配置Create subscription config

若要开始使用语音 CLI,首先需要输入语音订阅密钥和区域信息。To start using the Speech CLI, you first need to enter your Speech subscription key and region information. 请查看区域支持页,找到你的区域标识符。See the region support page to find your region identifier. 获得订阅密钥和区域标识符后(例如Once you have your subscription key and region identifier (ex. chinaeast2),运行以下命令。chinaeast2), run the following commands.

spx config @key --set YOUR-SUBSCRIPTION-KEY
spx config @region --set YOUR-REGION-ID

现在会存储订阅身份验证,用于将来的 SPX 请求。Your subscription authentication is now stored for future SPX requests. 如果需要删除这些已存储值中的任何一个,请运行 spx config @region --clearspx config @key --clearIf you need to remove either of these stored values, run spx config @region --clear or spx config @key --clear.

基本用法Basic usage

本部分介绍了一些基本的 SPX 命令,这些命令对于首次测试和试验通常非常有用。This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. 首先,通过运行以下命令,使用默认麦克风执行一些语音识别。Start by performing some speech recognition using your default microphone by running the following command.

spx recognize --microphone

SPX 将在输入命令后开始侦听当前活动输入设备上的音频,并在你按下 ENTER 后停止。After entering the command, SPX will begin listening for audio on the current active input device, and stop after you press ENTER. 然后,识别所录制的语音,并将其转换为控制台输出中的文本。The recorded speech is then recognized and converted to text in the console output. 使用语音 CLI,还可以轻松地进行文本转语音合成。Text-to-speech synthesis is also easy to do using the Speech CLI.

运行以下命令会将已输入的文本作为输入,并将合成的语音输出到当前活动的输出设备。Running the following command will take the entered text as input, and output the synthesized speech to the current active output device.

spx synthesize --text "Testing synthesis using the Speech CLI" --speakers

除了语音识别和合成外,还可以通过语音 CLI 进行语音翻译。In addition to speech recognition and synthesis, you can also do speech translation with the Speech CLI. 与上面的语音识别命令类似,运行以下命令以从默认麦克风中捕获音频,并在目标语言中将其转换为文本。Similar to the speech recognition command above, run the following command to capture audio from your default microphone, and perform translation to text in the target language.

spx translate --microphone --source en-US --target ru-RU --output file C:\some\file\path\russian_translation.txt

在此命令中,你既指定了源语言(要翻译的语言),又指定了目标语言(翻译成的语言) 。In this command, you specify both the source (language to translate from), and the target (language to translate to) languages. 使用 --microphone 参数将侦听当前活动输入设备上的音频,并在你按 ENTER 后停止。Using the --microphone argument will listen to audio on the current active input device, and stop after you press ENTER. 输出即将文本翻译为目标语言,写入到文本文件。The output is a text translation to the target language, written to a text file.

备注

有关所有受支持的语言及其相应的区域设置代码列表,请参阅语言和区域设置文章See the language and locale article for a list of all supported languages with their corresponding locale codes.

批处理操作Batch operations

上一部分中的命令非常适合用于快速查看语音服务的工作方式。The commands in the previous section are great for quickly seeing how the Speech service works. 在评估是否可以满足用例时,你可能需要对已有的输入范围执行批处理操作,以查看服务如何处理各种情况。However, when assessing whether or not your use-cases can be met, you likely need to perform batch operations against a range of input you already have, to see how the service handles a variety of scenarios. 本节介绍如何完成下列操作:This section shows how to:

  • 在音频文件目录上运行批处理语音识别Run batch speech recognition on a directory of audio files
  • 循环访问 .tsv 文件并运行批处理文本转语音合成Iterate through a .tsv file and run batch text-to-speech synthesis

批处理语音识别Batch speech recognition

如果有音频文件的目录,则通过语音 CLI 可以轻松地快速运行批处理语音识别。If you have a directory of audio files, it's easy with the Speech CLI to quickly run batch-speech recognition. 只需运行以下命令,即可使用 --files 命令指向目录。Simply run the following command, pointing to your directory with the --files command. 在此示例中,将 \*.wav 追加到目录,以识别目录中存在的所有 .wav 文件。In this example, you append \*.wav to the directory to recognize all .wav files present in the dir. 此外,指定 --threads 参数以在 10 个并行线程上运行识别。Additionally, specify the --threads argument to run the recognition on 10 parallel threads.

备注

--threads 参数也可以在下一部分中用于 spx synthesize 命令,可用线程将取决于 CPU 及其当前负载百分比。The --threads argument can be also used in the next section for spx synthesize commands, and the available threads will depend on the CPU and it's current load percentage.

spx recognize --files C:\your_wav_file_dir\*.wav --output file C:\output_dir\speech_output.tsv --threads 10

使用 --output file 参数将识别的语音输出写入到 speech_output.tsvThe recognized speech output is written to speech_output.tsv using the --output file argument. 下面是输出文件结构的示例。The following is an example of the output file structure.

audio.input.id    recognizer.session.started.sessionid    recognizer.recognized.result.text
sample_1    07baa2f8d9fd4fbcb9faea451ce05475    A sample wave file.
sample_2    8f9b378f6d0b42f99522f1173492f013    Sample text synthesized.

批处理文本转语音合成Batch text-to-speech synthesis

运行批处理文本转语音的最简单方法是创建一个新的 .tsv(制表符分隔值)文件,并利用语音 CLI 中的 --foreach 命令。The easiest way to run batch text-to-speech is to create a new .tsv (tab-separated-value) file, and leverage the --foreach command in the Speech CLI. 请考虑以下文件 text_synthesis.tsvConsider the following file text_synthesis.tsv:

audio.output    text
C:\batch_wav_output\wav_1.wav    Sample text to synthesize.
C:\batch_wav_output\wav_2.wav    Using the Speech CLI to run batch-synthesis.
C:\batch_wav_output\wav_3.wav    Some more text to test capabilities.

接下来,运行命令以指向 text_synthesis.tsv,对每个 text 字段执行合成,然后将结果作为 .wav 文件写入相应的 audio.output 路径中。Next, you run a command to point to text_synthesis.tsv, perform synthesis on each text field, and write the result to the corresponding audio.output path as a .wav file.

spx synthesize --foreach in @C:\your\path\to\text_synthesis.tsv

此命令等效于对 .tsv 文件中的每个记录运行 spx synthesize --text Sample text to synthesize --audio output C:\batch_wav_output\wav_1.wavThis command is the equivalent of running spx synthesize --text Sample text to synthesize --audio output C:\batch_wav_output\wav_1.wav for each record in the .tsv file. 请注意以下几点:A couple things to note:

  • 列标题 audio.outputtext 分别对应于命令行参数 --audio output--textThe column headers, audio.output and text, correspond to the command line arguments --audio output and --text, respectively. 多部分命令行参数(如 --audio output)应在文件中格式化,无空格,无前导短划线,使用句点分隔字符串,例如 audio.outputMulti-part command line arguments like --audio output should be formatted in the file with no spaces, no leading dashes, and periods separating strings, e.g. audio.output. 使用此模式,可以将任何其他现有命令行参数作为其他列添加到文件中。Any other existing command line arguments can be added to the file as additional columns using this pattern.
  • 以这种方式格式化文件时,不需要将其他参数传递给 --foreachWhen the file is formatted in this way, no additional arguments are required to be passed to --foreach.
  • 请确保通过选项卡将 .tsv 中的每个值分隔开。Ensure to separate each value in the .tsv with a tab.

但是,如果你具有如下面示例的 .tsv 文件(其列标头不匹配命令行参数):However, if you have a .tsv file like the following example, with column headers that do not match command line arguments:

wav_path    str_text
C:\batch_wav_output\wav_1.wav    Sample text to synthesize.
C:\batch_wav_output\wav_2.wav    Using the Speech CLI to run batch-synthesis.
C:\batch_wav_output\wav_3.wav    Some more text to test capabilities.

可以在 --foreach 调用中使用以下语法将这些字段名称替代为正确的参数。You can override these field names to the correct arguments using the following syntax in the --foreach call. 此调用与上面相同。This is the same call as above.

spx synthesize --foreach audio.output;text in @C:\your\path\to\text_synthesis.tsv

后续步骤Next steps