Abstract: Voice Liveness Detection (VLD) aims to protect speaker authentication from speech spoofing by determining whether speeches come from live speakers or loudspeakers. Previous methods mainly focus on their differences at the signal level. In this paper, we propose the first VLD that uses the human auditory feedback mechanism (i.e., the Lombard effect), called Lombard-VLD. The key idea is that live speakers can physiologically and involuntarily adjust their speaking patterns in a noisy background but loudspeakers cannot. Moreover, we design a reference-based dual input mode and a differential SE-ResBlock to model the acoustic differences caused by the Lombard effect. Experimental results show that Lombard-VLD achieves 0% and 0.24% EER in two datasets, outperforming the state-of-the-art methods. It is robust to various environmental factors, including different distances, postures of the speaker, and environmental noise, with an average accuracy of over 98.51%. It also has a good generalization to unseen speakers, genders, and datasets, with EER lower than 2.68%, 3.44%, and 7.32%, respectively. This work shows the advantages of the Lombard effect in VLD, which has fewer user limitations and better detection performance.
Loading