Classification of story-telling and poem recitation using head gesture of the talker

C. A. Valliappan, Anurag Das, Prasanta Kumar Ghosh

Published: 01 Jan 2018, Last Modified: 11 Sept 2024SPCOM 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this work, we investigate the nature of head gestures in spontaneous speech during story-telling in comparison to that in poem recitation. We hypothesize that head gestures during poem recitation would be more repetitive and structured compared to those in case of spontaneous speech. To quantify this, we proposed a measure called degree of repetition (DoR). We also perform a story-telling vs poem recitation classification experiment using deep neural network (DNN). For the classification, both DoR as well as context dependent raw head gesture data are used. Analysis and experiments are performed using a database of 24 subjects each telling five stories and a different set of 10 subjects each reciting 20 poems, three times each, thus having data of comparable durations for story telling and poem recitation. Analysis of head gestures using DoR reveals that the DoR, on average, is higher during poem recitation compared to that during story-telling. A four-fold classification experiment between story-telling and poem recitation using DNN demonstrates that the raw head gestures result in an average classification accuracy of 85.79% and an average F-score of 89.05% while the DoR results in an average accuracy and F-score of 80.59% and 82.30% respectively indicating that the features learnt by DNN from raw head gestures are more discriminative than DoR features. While these accuracy and F-score are less than those (94.67% & 95.60%) obtained using acoustic feature such as Mel frequency cepstral coefficients (MFCCs), raw head gestures and MFCCs together yield a higher average accuracy (98.62%) and F-score (98.92%), indicating that the head gestures are complementary to the acoustic features for the classification task.