Abstract: Multimodal annotations add important cues to understand \textit{how} a conversation proceeded. In this paper, we further extend the automated conversation annotation system MONAH with \textit{pitch} and \textit{volume} annotations to become the state-of-the-art automatic annotation system in terms of the number of aspects being annotated automatically. MONAHv3 provides an automated solution that is competitive against the widely used, manual Jefferson transcription system. In automatic evaluations, the additions significantly improves supervised learning in ten out of fifteen experiments. With human evaluations to guess the emotions, the additions significantly outperformed the Jefferson transcription system. In terms of usability, human evaluations also showed that the system is significantly more usable than the Jefferson system. Lastly, human evaluations also indicated that the additions significantly improved paralinguistics (describing tone and volume) annotations over MONAHv2, elevating MONAHv3 to be comparable with Jefferson in paralinguistics. MONAHv3 is already and remains more competitive in kinesics (describing actions).
Paper Type: long
Research Area: Speech recognition, text-to-speech and spoken language understanding
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
0 Replies
Loading