Inferring Age from Linguistic and Verbal Cues in Celebrity Interviews

Yunting Yin, Steven Skiena

Published: 01 Jan 2023, Last Modified: 21 Jan 2025FAIML 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Current research shows that humans are able to make accurate inferences about people's age by listening to their speech, but automatic age recognition is challenging due to the limited amount of public corpora with realistic data. We add age annotations to VoxCeleb1&2 to make them longitudinal datasets suitable for age prediction tasks. The annotated dataset contains 730,281 utterances from 4432 speakers. Each celebrity appears 165 times in the final dataset on average, which allows aging analysis over time. Both age group classification and age regression are performed based on audio and transcripts of celebrity interviews. We achieve classification accuracy of 46.1% on female subjects and 40.0% on male subjects. For age regression, a mean absolute error of 8.22 years is achieved using audio features combined with sentence embeddings. To quantify how our prediction reflects the individual process of aging, we calculate the number of points that need to be removed to make age estimation monotonically increasing for each celebrity over time. The aging process is best captured for celebrities in children and young adult groups, in which an average of 1.25 and 1.57 points need to be removed for female and male subjects respectively with p-values 0.006 and 0.019.