DeepMSI-MER: Enhancing Multimodal Emotion Recognition through Contrastive Semantic Alignment and Visual Sequence Compression
Abstract: With the advancement of artificial intelligence and computer vision technologies, multimodal emotion recognition has become a prominent research topic. However, existing methods face challenges such as heterogeneous data fusion and the effective utilization of modality correlations. This paper proposes a novel multimodal emotion recognition approach, DeepMSI-MER, based on the integration of contrastive learning and visual sequence compression. The proposed method enhances cross-modal feature fusion through contrastive learning and reduces redundancy in the visual modality by leveraging visual sequence compression. Experimental results on two public datasets, IEMOCAP and MELD, demonstrate that DeepMSI-MER significantly improves the accuracy and robustness of emotion recognition, validating the effectiveness of multimodal feature fusion and the proposed approach.
Paper Type: Long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: Sentiment Analysis
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 2124
Loading