语音 SDK 的音频输入流 API 简介About the Speech SDK audio input stream API

使用语音 SDK 的音频输入流 API 可将音频流式传输到识别器,无需使用麦克风或输入文件 API。The Speech SDK's Audio Input Stream API provides a way to stream audio into the recognizers instead of using either the microphone or the input file APIs.

使用音频输入流时需按以下步骤操作:The following steps are required when using audio input streams:

  • 识别音频流的格式。Identify the format of the audio stream. 格式必须受语音 SDK 和语音服务支持。The format must be supported by the Speech SDK and the Speech service. 目前仅支持以下配置:Currently, only the following configuration is supported:

    PCM 格式的音频样本、一个频道、每个样本 16 位、每秒 8000 或 16000 次采样(每秒 16000 或 32000 字节)、两个块对齐(16 位,包括样本的内边距)。Audio samples in PCM format, one channel, 16 bits per sample, 8000 or 16000 samples per second (16000 or 32000 bytes per second), two block align (16 bit including padding for a sample).

    SDK 中用于创建音频格式的相应代码如下所示:The corresponding code in the SDK to create the audio format looks like this:

    byte channels = 1;
    byte bitsPerSample = 16;
    int samplesPerSecond = 16000; // or 8000
    var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);
    
  • 请确保代码可根据上述规格提供 RAW 音频数据。Make sure your code can provide the RAW audio data according to these specifications. 如果音频源数据不符合支持的格式,则必须将音频转码为所需格式。If your audio source data doesn't match the supported formats, the audio must be transcoded into the required format.

  • 自行创建派生自 PullAudioInputStreamCallback 的音频输入流类。Create your own audio input stream class derived from PullAudioInputStreamCallback. 实现 Read()Close() 元素。Implement the Read() and Close() members. 确切的函数签名取决于语言,但代码可能与如下代码示例类似:The exact function signature is language-dependent, but the code will look similar to this code sample:

    public class ContosoAudioStream : PullAudioInputStreamCallback {
        ContosoConfig config;
    
        public ContosoAudioStream(const ContosoConfig& config) {
            this.config = config;
        }
    
        public int Read(byte[] buffer, uint size) {
            // returns audio data to the caller.
            // e.g. return read(config.YYY, buffer, size);
        }
    
        public void Close() {
            // close and cleanup resources.
        }
    };
    
  • 根据音频格式和输入流创建音频配置。Create an audio configuration based on your audio format and input stream. 创建识别器时同时传入常规语音配置和音频输入配置。Pass in both your regular speech configuration and the audio input configuration when you create your recognizer. 例如:For example:

    var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(config), audioFormat);
    
    var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
    var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
    
    // run stream through recognizer
    var result = await recognizer.RecognizeOnceAsync();
    
    var text = result.GetText();
    

后续步骤Next steps