Analysis and Mitigation of Vocal Effort Variations in Speaker Recognition

Mahesh Kumar Nandwana, Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson

2019 (modified: 03 May 2023)ICASSP 2019Readers: Everyone

Abstract: In this work, we assess the impact of vocal effort on discrimination and calibration performance of a state-of-the-art speaker recognition system. We analyze three levels of vocal effort (low, normal, and high) from the SRI-FRTIV corpus. We use a deep neural network (DNN) speaker embeddings system with probabilistic linear discriminant analysis (PLDA) and find that vocal effort variation significantly degrades system performance. We apply both mixture PLDA (mix-PLDA) and trial-based calibration with condition PLDA similarity (TBC-CPLDA) to improve system robustness. Our proposed approaches resulted in 18% and 33% relative improvement in discrimination and calibration performance respectively on the SRI-FRTIV corpus.

0 Replies