Abstract: This paper presents a fully Bayesian hierarchical model for blind audio source separation in a noisy environment. Our probabilistic approach is based on Gaussian priors for the speech signals, Gamma hyperpriors for the speech precisions and a Gamma prior for the noise precision. The time-varying acoustic channels are modelled with a linear-Gaussian state-space model. The inference is carried out using a variational Expectation-Maximization (VEM) algorithm, leading to a variant of the multi-speaker multichannel Wiener filter (MCWF) to separate and enhance the audio sources, and a Kalman smoother to infer the acoustic channels. The VEM speech estimator can be decomposed into two stages: A multi-speaker linearly constrained minimum variance (LCMV) beamformer followed by a variational multi-speaker postfilter. The proposed algorithm is evaluated in a static scenario using recorded room impulse responses (RIRs) with two reverberation levels, showing superior performance compared to competing methods.
0 Replies
Loading