Maximizing Predicted Signal-to-Distortion Ratio: A New Microphone Selection Criterion for Beamforming in Acoustic Sensor Networks
Abstract: This paper addresses the problem of selecting an effective subset of microphones in acoustic sensornetworks (ASNs) for speech enhancement applications. A basic approach to this problem is to select asubset of microphones such that the output signal-to-noise ratio (oSNR) of the beamforming output signalis maximized. However, oSNR does not necessarily correlate well with widely used signal quality measures such as the signal-to-distortion ratio (SDR), perceptual evaluation of speech quality (PESQ), and short-time objective intelligibility (STOI). We here introduce a new measure that predicts SDR for beamforming output signals. This measure, termed predicted SDR (pSDR), demonstrates a better correlation with SDR, PESQ, and STOI compared to oSNR. Whereas the original SDR requires an oracle reference signal for measurements, the proposed pSDR is defined in the short-time Fourier transform domain and can be estimated solely from source covariance matrices already used for obtaining beamformers. Therefore, our pSDR can be immediately used as a measure for selecting microphones instead of oSNR. We also extend conventional subset selection methods to jointly select a reference microphone of beamformers in a unified optimization problem. This extension is crucial because the choice of the reference microphone can significantly affect the quality of enhanced signals in ASNs. Numerical experiments suggest that, as a measure for selecting both a microphone subset and a reference microphone, pSDR is generally better than, or at least comparable to, oSNR in terms of SDR, PESQ, and STOI. We also demonstrate the effectiveness of selecting a microphone subset jointly with a reference microphone of beamformers.
External IDs:doi:10.1109/taslpro.2025.3574841
Loading