Over-Determined Semi-Blind Speech Source Separation

Masahito Togami, Robin Scheibler

Published: 2021, Last Modified: 06 Feb 2025APSIPA ASC 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a semi-blind speech source separation that jointly optimizes several acoustic functions, i.e., speech source separation (SS), dereverberation (DR), acoustic echo reduction (AE), and background noise reduction (BG). Instead of cascade connection of SS, AE, and DR, the proposed method performs DR, SS, and AE by a unified time-invariant filter. We assume the over-determined condition that the number of the microphones Nm. is larger than the number of near-end speech sources Ns. Joint optimization of DR, SS, AE, and BG can be performed by using the Nm - Ns dimensional subspace of the time-invariant filter for BG. Furthermore, we reveal that this subspace can be also utilized for residual acoustic echo reduction (RR) in which residual acoustic echo signal is reduced by spatial filtering. Two types of joint parameter optimization techniques for DR, SS, AE, BG, and RR are proposed based on vectorwise coordinate descent and fast multichannel nonnegative matrix factorization. Experimental results show that the proposed methods perform DR, SS, AE, and BG better than a cascade method. When the acoustic echo path between a loudspeaker and a microphone is time-varying, performance improvement of the proposed methods with RR is larger than the proposed method without RR.