语音设备 SDK 麦克风阵列建议Speech Devices SDK Microphone array recommendations

本文介绍如何为语音设备 SDK 设计麦克风阵列。In this article, you learn how to design a microphone array for the Speech Devices SDK.

语音设备 SDK 最适合与根据以下准则设计的麦克风阵列(包括麦克风几何结构和组件选择)配合使用。The Speech Devices SDK works best with a microphone array that has been designed according to the following guidelines, including the microphone geometry and component selection. 本文还提供了有关集成和电力注意事项的指导。Guidance is also given on integration and electrical considerations.

麦克风几何结构Microphone geometry

建议将以下阵列几何结构用于 Microsoft 音频堆栈。The following array geometries are recommended for use with the Microsoft Audio Stack. 借助更多的麦克风以及与特定应用程序、用户方案和设备外形规格之间的依赖关系,可以改善音源定位和环境噪声的抑制。Location of sound sources and rejection of ambient noise is improved with greater number of microphones with dependencies on specific applications, user scenarios, and the device form factor.

麦克风数和几何结构Mics & Geometry 环形阵列Circular Array 环形阵列Circular Array 线性阵列Linear Array 线性阵列Linear Array
7 mic circular array 4 mic circular array 4 mic linear array 2 mic linear array
# 麦克风数目# Mics 77 44 44 22
几何结构Geometry 6 个外置,1 个中置,半径 = 42.5 毫米,均匀排布间距6 Outer, 1 Center, Radius = 42.5 mm, Evenly Spaced 3 个外置,1 个中置,半径 = 42.5 毫米,均匀排布间距3 Outer, 1 Center, Radius = 42.5 mm, Evenly Spaced 长度 = 120 毫米,间距 = 40 毫米Length = 120 mm, Spacing = 40 mm 间距 = 40 毫米Spacing = 40 mm

应根据上述每个阵列的编号(从 0 开始递增)排列麦克风声道的顺序。Microphone channels should be ordered according to the numbering depicted for each above array, increasing from 0. 需要为 Microsoft 音频堆栈提供额外的音频播放参考流才能让它执行回声消除。The Microsoft Audio Stack will require an additional reference stream of audio playback to perform echo cancellation.

组件选择Component selection

应选择适当的麦克风组件来准确重现无噪声和失真的信号。Microphone components should be selected to accurately reproduce a signal free of noise and distortion.

选择麦克风时建议的属性如下:The recommended properties when selecting microphones are:

参数Parameter 建议Recommended
信噪比SNR >= 65 dB(1 kHz 信号,94 dBSPL,A 加权噪声)>= 65 dB (1 kHz signal 94 dBSPL, A-weighted noise)
振幅匹配Amplitude Matching ± 1 dB @ 1 kHz± 1 dB @ 1 kHz
相位匹配Phase Matching ± 2° @ 1 kHz± 2° @ 1 kHz
声学过载点 (AOP)Acoustic Overload Point (AOP) >= 120 dBSPL (THD = 10%)>= 120 dBSPL (THD = 10%)
比特率Bit Rate 最小 24 位Minimum 24-bit
采样率Sampling Rate 最小 16 kHz*Minimum 16 kHz*
频率响应Frequency Response ± 3 dB,200-8000 Hz 浮动掩码*± 3 dB, 200-8000 Hz Floating Mask*
可靠性Reliability 存储温度范围:-40°C 到 70°CStorage Temperature Range -40°C to 70°C
工作温度范围:-20°C 到 55°COperating Temperature Range -20°C to 55°C

*对于优质通信 (VoIP) 应用程序,可能需要更高的采样率或“更宽”的频率范围*Higher sampling rates or "wider" frequency ranges may be necessary for high-quality communications (VoIP) applications

必须搭配良好的电声集成选择适当的组件,以免降低所用组件的性能。Good component selection must be paired with good electroacoustic integration in order to avoid impairing the performance of the components used. 独特的用例还可能需要满足额外的要求(例如工作温度范围)。Unique use cases may also necessitate additional requirements (for example: operating temperature ranges).

麦克风阵列集成Microphone array integration

集成到设备后,麦克风阵列的性能将与组件规格不同。The performance of the microphone array when integrated into a device will differ from the component specification. 必须确保在集成后适当匹配麦克风。It is important to ensure that the microphones are well matched after integration. 因此,在经过任何固定增益或均衡 (EQ) 之后测量的设备性能应符合以下建议:Therefore the device performance measured after any fixed gain or EQ should meet the following recommendations:

参数Parameter 建议Recommended
信噪比SNR > 63 dB(1 kHz 信号,94 dBSPL,A 加权噪声)> 63 dB (1 kHz signal 94 dBSPL, A-weighted noise)
输出灵敏度Output Sensitivity -26 dBFS/Pa @ 1 kHz(建议)-26 dBFS/Pa @ 1 kHz (recommended)
振幅匹配Amplitude Matching ± 2 dB,200-8000 Hz± 2 dB, 200-8000 Hz
总谐波失真率*THD%* ≤ 1%,200-8000 Hz,94 dBSPL,5 阶≤ 1%, 200-8000 Hz, 94 dBSPL, 5th Order
频率响应Frequency Response ± 6 dB,200-8000 Hz 浮动掩码**± 6 dB, 200-8000 Hz Floating Mask**

**需要使用一个低失真扬声器(例如 Neumann KH120)来测量总谐波失真**A low distortion speaker is required to measure THD (e.g. Neumann KH120)

**对于优质通信 (VoIP) 应用程序,可能需要“更宽”的频率范围**"Wider" frequency ranges may be necessary for high-quality communications (VoIP) applications

扬声器集成建议Speaker integration recommendations

由于包含扬声器的语音识别设备需要回声消除,因此我们在扬声器选择和集成方面提供了附加的建议。As echo cancellation is necessary for speech recognition devices that contain speakers, additional recommendations are provided for speaker selection and integration.

参数Parameter 建议Recommended
线性注意事项Linearity Considerations 在扬声器参考信号后面不要进行非线性处理,否则需要基于硬件的环回参考流No non-linear processing after speaker reference, otherwise a hardware-based loopback reference stream is required
扬声器环回Speaker Loopback 通过 WASAPI、专用 API、自定义 ALSA 插件 (Linux) 提供,或通过固件通道提供Provided via WASAPI, private APIs, custom ALSA plug-in (Linux), or provided via firmware channel
总谐波失真率THD% 1/3 倍频程,最低 5 阶,70 dBA 播放 @ 0.8 m ≤ 6.3%,315-500 Hz ≤ 5%,630-5000 Hz3rd Octave Bands minimum 5th Order, 70 dBA Playback @ 0.8 m ≤ 6.3%, 315-500 Hz ≤ 5%, 630-5000 Hz
麦克风回声耦合Echo Coupling to Microphones > -10 dB TCLw,使用 ITU-T G.122 Annex B.4 方法,规范化为麦克风水平> -10 dB TCLw using ITU-T G.122 Annex B.4 method, normalized to mic level
TCLw = TCLwmeasured + (测量水平 - 目标输出灵敏度)TCLw = TCLwmeasured + (Measured Level - Target Output Sensitivity)
TCLw = TCLwmeasured + (测量水平 - (-26))TCLw = TCLwmeasured + (Measured Level - (-26))

集成设计体系结构Integration design architecture

将麦克风集成到设备时,需要遵守以下体系结构方面的准则:The following guidelines for architecture are necessary when integrating microphones into a device:

参数Parameter 建议Recommendation
麦克风端口相似性Mic Port Similarity 阵列中的所有麦克风端口具有相同的长度All microphone ports are same length in array
麦克风端口尺寸Mic Port Dimensions 端口大小:Ø0.8-1.0 毫米Port size Ø0.8-1.0 mm. 端口长度/端口直径:< 2Port Length / Port Diameter < 2
麦克风密封性Mic Sealing 在堆栈中使用统一的密封垫片。Sealing gaskets uniformly implemented in stack-up. 建议对泡沫垫片实施 > 70% 的压缩率Recommend > 70% compression ratio for foam gaskets
麦克风可靠性Mic Reliability 应使用滤网来阻挡灰尘和入口污物(安装在端口位于底部的麦克风的 PCB 与密封垫片/顶盖之间)Mesh should be used to prevent dust and ingress (between PCB for bottom ported microphones and sealing gasket/top cover)
麦克风隔离Mic Isolation 在结构中安装橡胶垫片和振动解耦装置,专门用于隔离集成扬声器后出现的任何振动路径Rubber gaskets and vibration decoupling through structure, particularly for isolating any vibration paths due to integrated speakers
采样时钟Sampling Clock 设备音频不可出现低偏差的抖动和断续Device audio must be free of jitter and drop-outs with low drift
录制功能Record Capability 设备必须能够同时录制单个原声道流The device must be able to record individual channel raw streams simultaneously
USBUSB 所有 USB 音频输入设备必须根据 USB 音频设备修订版 3 规范设置描述符All USB audio input devices must set descriptors according to the USB Audio Devices Rev3 Spec
麦克风几何结构Microphone Geometry 驱动程序必须正确实现麦克风阵列几何描述符Drivers must implement Microphone Array Geometry Descriptors correctly
可发现性Discoverability 设备中不能包含任何不可发现或不可控的硬件、固件或第三方基于软件的非线性音频处理算法Devices must not have any undiscoverable or uncontrollable hardware, firmware, or 3rd party software-based non-linear audio processing algorithms to/from the device
捕获格式Capture Format 捕获格式必须使用最小 16 kHz 采样率和建议的 24 位深度Capture formats must use a minimum sampling rate of 16 kHz and recommended 24-bit depth

电力体系结构注意事项Electrical architecture considerations

在适用的情况下,阵列可以连接到 USB 主机(例如,运行 Microsoft 音频堆栈的 SoC),并可与语音服务或其他应用程序对接。Where applicable, arrays may be connected to a USB host (such as an SoC that runs the Microsoft Audio Stack) and interfaces to Speech services or other applications.

硬件组件(例如 PDM-TDM 转换组件)应确保在再采样器中保留麦克风的动态范围和信噪比。Hardware components such as PDM-to-TDM conversion should ensure that the dynamic range and SNR of the microphones is preserved within re-samplers.

高速 USB 音频类 2.0 应在任何音频 MCU 中受支持,以便在使用较高采样率和位深度时,为最多 7 个声道提供所需的带宽。High-speed USB Audio Class 2.0 should be supported within any audio MCUs in order to provide the necessary bandwidth for up to seven channels at higher sample rates and bit depths.

后续步骤Next steps