Heterogeneous Graph Embedding for Multimodal Multi-Label Emotion Recognition

Published: 2025, Last Modified: 12 Nov 2025ICMR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal Multi-label Emotion Recognition (MMER) aims to identify human emotions through various modalities. Previous studies mainly focus on aligning cross-modal data to extract discriminative emotion-dependent features using attention or reconstruction-based strategies, while omitting the fact that the MMER task is also subjected to the multi-label noises that exist in the multi-label classifications, which disturb the modality-to-label correlations. Besides, most of the research also failed to balance the strategy of finding internal label correlations and label dependency of modalities in noisy conditions. In this paper, we proposed a novel Heterogeneous Graph Embedding (HGE) method for the MMER task, which exploits heterogeneous graphs to extract emotional commonality in the modal and temporal levels, and explicitly models cross-modal correlations among heterogeneous modalities. Additionally, it also captures uncertainty brought by multi-label noise and leverages the unevenness of multi-label to overcome potential data issues. Experimental results demonstrate that our HGE method achieves state-of-the-art performance on two widely used multimodal multi-label emotion recognition datasets under both noise-free and noisy circumstances.
Loading