An MVDR-Embedded U-Net Beamformer for Effective and Robust Multichannel Speech Enhancement

Ching-Hua Lee, Kashyap Patel, Chouchang Yang, Yilin Shen, Hongxia Jin

Published: 14 Apr 2024, Last Modified: 30 Sept 2024ICASSP 2024EveryoneCC BY 4.0

Abstract: In multichannel speech enhancement (SE) systems, deep neural networks (DNNs) are often utilized to directly estimate the clean speech for effective beamforming. This approach, however, may not generalize adequately to new acoustic or noise conditions. Alternatively, DNNs can indirectly perform SE by predicting the time-frequency masks of speech and noise patterns to assist classic statistical beamformers. Despite being robust, its effectiveness is constrained by the later statistical component relying on certain modeling assumptions, e.g., covariance-based modeling in the minimum-variance-distortionless-response (MVDR) beamformer. In this paper, we propose a novel integration of the two types of methodology, by introducing an intra-MVDR module embedded in the U-Net beamformer, that encompasses the merits of both, i.e., effectiveness and robustness. Experiments show that intra-MVDR leads to improvements that are not achievable by simply enlarging the baseline SE network.