Multimodal Emotion Recognition based on 2D Kernel Density Estimation for Multiple Labels Fusion

Zhaojie Luo, Kazunori Komatani

Published: 01 Jan 2023, Last Modified: 17 Jul 2025APSIPA ASC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Psychological emotion can be typically categorized into discrete emotion states, such as anger, happiness, and neutrality labels, as well as estimated as degrees of a 2D continuous valence-arousal (VA) space. Previous studies on multimodal emotion recognition have employed fusion mechanisms for multiple modalities but treated emotion labels and VA degrees as separate recognition tasks. By modeling the relationship between these different labels, it becomes possible to leverage training datasets with different types of labels for improving multimodal emotion recognition. In this study, we explore the utilization of multiple labels by employing a 2D Kernel Density Estimation (2D-KDE) method to mathematically model their relations. Subsequently, we propose a label fusion layer (LFL) based on these relations to adjust the predicted probabilities of emotion states obtained from existing baselines of multimodal emotion recognition networks. Through extensive experiments, we demonstrate the effectiveness of our proposed model in improving emotion recognition performance and achieving superior results on the IEMOCAP and OMG-Emotion datasets.