Generative Error Correction for Emotion-aware Speech-to-text Translation

ACL ARR 2025 February Submission1310 Authors

13 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper explores emotion-aware speech-to-text translation (ST) using generative error correction (GER) by large language models (LLMs). Despite recent advancements in ST, the impact of the emotional content has been overlooked. First, we enhance the translation of emotional speech by adopting the GER paradigm: Finetuned an LLM to generate the translation based on the decoded N-best hypotheses. Moreover, we combine the emotion and sentiment labels into the LLM finetuning process to enable the model to consider the emotion content. In addition, we project the ST model's latent representation into the LLM embedding space to further improve emotion recognition and translation. Experiments on an English-Chinese dataset show the effectiveness of the combination of GER, emotion and sentiment labels, and the projector for emotion-aware ST. We will release our codes to the public.
Paper Type: Short
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: spoken language translation
Contribution Types: NLP engineering experiment
Languages Studied: English, Chinese, Japanese, German
Submission Number: 1310
Loading