Smile: Spiking Multi-Modal Interactive Label-Guided Enhancement Network for Emotion Recognition

Ming Guo, Wenrui Li, Chao Wang, Yuxin Ge, Chongjun Wang

Published: 14 Jul 2024, Last Modified: 05 Jun 20252024 IEEE International Conference on Multimedia and Expo (ICME)EveryoneCC BY 4.0

Abstract: Multi-modal multi-label emotion recognition has gained significant attention in the field of affective computing, enabling various signals to distinguish complex emotions accurately. However, previous studies primarily focus on capturing invariant representations, neglecting the importance of incorporating the fluctuation of temporal information which affects the model robustness. In this paper, we propose a novel Spiking Multi-modal Interactive Label-guided Enhancement network (SMILE). It introduces the spiking neural network with dynamic thresholds, allowing flexible processing of temporal information to enhance the model robustness. Furthermore, it employs the scale spiking fusion to enrich semantic information. In addition to modality-specific refinement, SMILE integrates the modality-interactive exploration and label-modality matching modules to capture multi-modal interaction and label-modality dependence. Experimental results on benchmark datasets CMU-MOSEI and NEMu demonstrate the superiority of SMILE over state-of-the-art models. Notably, SMILE achieves a significant 28.5% improvement in accuracy compared to the benchmark method when evaluated on NEMu dataset.