如何使用音频输入流

2025/03/18

使用语音 SDK 可将音频流式传输到识别器，此方法可以替代麦克风或文件输入。

本指南介绍了如何使用音频输入流。它还描述了音频输入流的一些要求和限制。

请在 GitHub 上查看更多使用音频输入流进行语音转文本识别的示例。

识别音频流的格式

识别音频流的格式。

支持的音频样本包括：

PCM 格式（int-16，已签名）
单通道
每个样本 16 位，每秒 8,000 或 16,000 个样本（每秒 16,000 字节或 32,000 字节）
双块对齐（16 位，包括样本填充）

SDK 中用于创建音频格式的相应代码如以下示例所示：

byte channels = 1;
byte bitsPerSample = 16;
int samplesPerSecond = 16000; // or 8000
var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);

请确保代码根据上述规格提供 RAW 音频数据。同时确保 16 位样本以 little-endian 格式到达。如果音频源数据不符合支持的格式，则必须将音频转码为所需格式。

自行创建音频输入流类

可以自行创建派生自 PullAudioInputStreamCallback 的音频输入流类。实现 Read() 和 Close() 元素。确切的函数签名取决于语言，但代码可能与以下代码示例类似：

public class ContosoAudioStream : PullAudioInputStreamCallback 
{
    public ContosoAudioStream() {}

    public override int Read(byte[] buffer, uint size) 
    {
        // Returns audio data to the caller.
        // E.g., return read(config.YYY, buffer, size);
        return 0;
    }

    public override void Close() 
    {
        // Close and clean up resources.
    }
}

根据音频格式和自定义音频输入流创建音频配置。例如：

var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(), audioFormat);

下面介绍了如何在语音识别器上下文中使用自定义音频输入流：

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

public class ContosoAudioStream : PullAudioInputStreamCallback 
{
    public ContosoAudioStream() {}

    public override int Read(byte[] buffer, uint size) 
    {
        // Returns audio data to the caller.
        // E.g., return read(config.YYY, buffer, size);
        return 0;
    }

    public override void Close() 
    {
        // Close and clean up resources.
    }
}

class Program 
{
    static string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY");
    static string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION");

    async static Task Main(string[] args)
    {
        byte channels = 1;
        byte bitsPerSample = 16;
        uint samplesPerSecond = 16000; // or 8000
        var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);
        var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(), audioFormat);

        var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); 
        speechConfig.SpeechRecognitionLanguage = "en-US";
        var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

        Console.WriteLine("Speak into your microphone.");
        var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={speechRecognitionResult.Text}");
    }
}

Microsoft Ignite

通过

识别音频流的格式

自行创建音频输入流类

后续步骤

通过

如何使用音频输入流

识别音频流的格式

自行创建音频输入流类

后续步骤

其他资源