Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition

Akshay Kalkunte Suresh, Srinivasa Raghavan K. M., Prasanta Kumar Ghosh

Published: 2017, Last Modified: 30 Sept 2024INTERSPEECH 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We consider the problem of automatically detecting if a speaker is suffering from common cold from his/her speech. When a speaker has symptoms of cold, his/her voice quality changes compared to the normal one. We hypothesize that such a change in voice quality could be reflected in lower likelihoods from a model built using normal speech. In order to capture this, we compute a 120-dimensional posteriorgram feature in each frame using Gaussian mixture model from 120 states of 40 three-states phonetic hidden Markov models trained on approximately 16.4 hours of normal English speech. Finally, a fixed 5160-dimensional phoneme state posteriorgram (PSP) feature vector for each utterance is obtained by computing statistics from the posteriorgram feature trajectory. Experiments on the 2017-Cold sub-challenge data show that when the decisions from bag-of-audio-words (BoAW) and end-to-end (e2e) are combined with those from PSP features with unweighted majority rule, the UAR on the development set becomes 69% which is 2.9% (absolute) better than the best of the UARs obtained by the baseline schemes. When the decisions from ComParE, BoAW and PSP features are combined with simple majority rule, it results in a UAR of 68.52% on the test set.