Audio-visual gender recognition in uncontrolled environment using variability modeling techniques

Laurent El Shafey, Elie Khoury, Sébastien Marcel

Published: 2014, Last Modified: 15 Nov 2024IJCB 2014EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The problem of gender recognition using visual and acoustic cues has recently received significant attention. This paper explores the use of Total Variability (i-vectors) and Inter-Session Variability (ISV) modeling techniques for both unimodal and bimodal gender recognition, and compares them to several state-of-the-art algorithms. The experimental evaluation is conducted on the FERET and LFW databases for face-based gender recognition, on the NIST-SRE database for audio-based gender recognition, and on the MOBIO database for audio-visual gender recognition. Results on LFW show that the i-vectors technique outperforms state-of-the-art algorithms, which are based on Support Vector Machines (SVM) applied either on raw pixels, on Local Binary Patterns (LBP) or on Gabor filters, with an accuracy rate of about 95%. Results on NIST-SRE show that the i-vectors system is also superior to state-of-the-art GMM-based gender recognition systems, with a relative gain of about 11%. Finally, results on MOBIO show that i-vectors and ISV also take advantage of combining visual and acoustic cues using logistic regression. The resulting bimodal systems achieve accuracy rates of about 98%.