Automated pitch and volume annotations for multimodal textual transcriptions

Anonymous

Automated pitch and volume annotations for multimodal textual transcriptions

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Multimodal annotations add important cues to understand \textit{how} a conversation proceeded. In this paper, we further extend the automated conversation annotation system MONAH with \textit{pitch} and \textit{volume} annotations to become the state-of-the-art automatic annotation system in terms of the number of aspects being annotated automatically. MONAHv3 provides an automated solution that is competitive against the widely used, manual Jefferson transcription system. In automatic evaluations, the additions significantly improves supervised learning in ten out of fifteen experiments. With human evaluations to guess the emotions, the additions significantly outperformed the Jefferson transcription system. In terms of usability, human evaluations also showed that the system is significantly more usable than the Jefferson system. Lastly, human evaluations also indicated that the additions significantly improved paralinguistics (describing tone and volume) annotations over MONAHv2, elevating MONAHv3 to be comparable with Jefferson in paralinguistics. MONAHv3 is already and remains more competitive in kinesics (describing actions).

Paper Type: long

Research Area: Speech recognition, text-to-speech and spoken language understanding

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

0 Replies

Loading