Prosody Detection improves Pretrained Automatic Speech Recognition

Prosody Detection improves Pretrained Automatic Speech Recognition

ACL ARR 2024 June Submission2557 Authors

15 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We show the performance of Automatic Speech Recognition (ASR) systems that use semi-supervised speech representations can be be boosted by a complimentary prosody detection module, by introducing a joint ASR and prosody detection model. The prosody detection component of our model achieves a significant improvement on the state-of-the-art for the task, closing the gap in F1-score by 41%. Additionally, the ASR performance in joint training decreases WER by 28.3% on LibriSpeech, under limited resource fine-tuning. With these results, we show the importance of extending pretrained speech models to retain or relearn important prosodic cues.

Paper Type: Short

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2557

Loading