A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

Zeinab Sadat Taghavi; Ali Satvaty; Hossein Sameti

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

Zeinab Sadat Taghavi, Ali Satvaty, Hossein Sameti

01 Mar 2023 (modified: 27 Apr 2025)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone

Keywords: speech emotion recognition, modality conversion, machine learning, deep learning

TL;DR: Our paper proposes a modality conversion approach from audio to text in order to improve speech emotion recognition performance on the MELD dataset, this is feeling speech in words.

Abstract: Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output and investigate the impact of modality conversion on SER, this method is called Modality-Conversion++. Our findings indicate that the first method yields substantial results, while the second method outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER weighted-F1 (WF1) score on the MELD dataset. This research highlights the potential of modality conversion for tasks that can be conducted in alternative modalities.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/a-change-of-heart-improving-speech-emotion/code)

13 Replies

Loading