Abstract: Voice activity detection is a necessary preprocessing step for many applications like channel identification or speech recognition. The problem can be solved even under noisy conditions by exploiting characteristics of speech and noise signals. However, when more speakers are active simultaneously, these methods are generally unreliable, since multiple speech signals may overlap completely in the time-frequency plane. Here, a new approach is suggested which is applicable in multi-speaker scenarios also, owing to its incorporation of higher order statistics. Here, independent component analysis is used to obtain estimates of the clean speech and the angles of incidence for each speaker. Subsequently, these estimates can help to correctly identify the active speaker and perform voice activity detection. The suggested approach is robust to noise as well as to interfering speech and can detect the presence of single speakers in mixtures of speech and noise, even under highly reverberant conditions at 0dB SIR.
Loading