Dual Stream Alignment with Hierarchical Bottleneck Fusion For Multimodal Sentiment Analysis

ACL ARR 2024 June Submission4266 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal sentiment analysis (MSA) leverages different modalities, such as text, image, and audio, for a comprehensive understanding of sentiment but faces challenges like temporal misalignment and modality heterogeneity. We propose a Dual-stream Alignment with Hierarchical Bottleneck Fusion (DAHB) method to address these issues. Our approach achieves comprehensive alignment through temporal alignment by cross-attention and semantic alignment via contrastive learning, ensuring alignment in time dimension and feature space. Moreover, Supervised contrastive learning is applied to refine these features. For modality fusion, we employ a hierarchical bottleneck method, progressively reducing bottleneck tokens to compress information and using bi-directional cross-attention to learn interactive between modalities. We conducted experiments on MOSI, MOSEI and CH-SIMS and results show that DAHB achieves state-of-the-art performance on a range of metrics. Ablation studies demonstrates the effectiveness of our methods. The code are available at url.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: emotion detection and analysis
Contribution Types: NLP engineering experiment
Languages Studied: English,Chinese
Submission Number: 4266
Loading