Enhancing Multimodal Sentiment Recognition Based on Cross-Modal Contrastive Learning

Published: 01 Jan 2024, Last Modified: 06 Jun 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, multimodal sentiment recognition has gained attention for its potential to boost accuracy by combining information from various sources. Addressing the challenge of modality-based heterogeneity, we present Cross-Modal Contrastive Learning (CMCL), a novel framework. CMCL integrates diversity, consistency, and sample-level contrastive learning to enhance multimodal feature representation. Diversity contrastive learning separates modality-specific features into distinct spaces to capture their complementarity. Meanwhile, consistency contrastive learning aligns representations across modalities for consistency. Our approach outperforms existing baselines on three benchmark datasets, setting a new state-of-the-art standard.
Loading