Exploring the Use of Large Language Models and Interpretable Features for Explainable Speech Emotion Recognition

Qifei Li, Yingming Gao, Yuhua Wen, Yingying Zhou, Zheng Lian, Bin Liu, Zhengqi Wen, Jianhua Tao, Ya Li

Published: 2026, Last Modified: 30 May 2026IEEE J. Sel. Top. Signal Process. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Speech emotion recognition (SER) has made significant advancements recently due to its critical role in human-computer interaction. However, current studies predominantly rely on discriminative recognition methods, which can classify emotions but fail to provide insights into the reasoning behind the classification. Recently, researchers have started using large language models (LLM) for explainable SER. Existing studies have two main approaches: one relies on manually annotated information as the basis for LLM to explain emotions, but this annotation is costly. The second converts speech information into textual descriptions as input to LLM, but these descriptions often contain limited details, which may lead to the loss of emotion-related information, thereby degrading performance. To address these issues, we first propose an automated method for annotating explainable speech emotion datasets to reduce annotation costs, using interpretable speech features instead of manually annotated subjective information as the basis for LLM to explain emotions. Second, we propose a generative explainable SER method based on LLM, called SEmoLLM, which uses WavLM to encode raw speech signals as input to the LLM, avoiding the issue of emotion-related information loss. Finally, we evaluate the proposed method on four emotion datasets. The experimental results demonstrate that the performance of SEmoLLM is comparable to that of discriminative emotion recognition, while also enabling basic speech emotion explanation. The results also show that generating descriptions of gender, pitch, or volume can improve emotion recognition performance. The proposed method and findings provide a new perspective on the explainability research in emotion-related tasks.
Loading