DeepMSI-MER: Enhancing Multimodal Emotion Recognition through Contrastive Semantic Alignment and Visual Sequence Compression

DeepMSI-MER: Enhancing Multimodal Emotion Recognition through Contrastive Semantic Alignment and Visual Sequence Compression

ACL ARR 2025 May Submission2124 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the advancement of artificial intelligence and computer vision technologies, multimodal emotion recognition has become a prominent research topic. However, existing methods face challenges such as heterogeneous data fusion and the effective utilization of modality correlations. This paper proposes a novel multimodal emotion recognition approach, DeepMSI-MER, based on the integration of contrastive learning and visual sequence compression. The proposed method enhances cross-modal feature fusion through contrastive learning and reduces redundancy in the visual modality by leveraging visual sequence compression. Experimental results on two public datasets, IEMOCAP and MELD, demonstrate that DeepMSI-MER significantly improves the accuracy and robustness of emotion recognition, validating the effectiveness of multimodal feature fusion and the proposed approach.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: Sentiment Analysis

Contribution Types: Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 2124

Loading