GRACE: GRadient-based Active Learning with Curriculum Enhancement for Multimodal Sentiment Analysis

Published: 20 Jul 2024, Last Modified: 06 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal sentiment analysis (MSA) aims to predict sentiment from text, audio, and visual data of videos. Existing works focus on designing fusion strategies or decoupling mechanisms, which suffer from low data utilization and a heavy reliance on large amounts of labeled data. However, acquiring large-scale annotations for multimodal sentiment analysis is extremely labor-intensive and costly. To address this challenge, we propose GRACE, a GRadient-based Active learning method with Curriculum Enhancement, designed for MSA under a multi-task learning framework. Our approach achieves annotation reduction by strategically selecting valuable samples from the unlabeled data pool while maintaining high-performance levels. Specifically, we introduce informativeness and representativeness criteria, calculated from gradient magnitudes and sample distances, to quantify the active value of unlabeled samples. Additionally, an easiness criterion is incorporated to avoid outliers, considering the relationship between modality consistency and sample difficulty. During the learning process, we dynamically balance sample difficulty and active value, guided by the curriculum learning principle. This strategy prioritizes easier, modality-aligned samples for stable initial training, then gradually increases the difficulty by incorporating more challenging samples with modality conflicts. Extensive experiments demonstrate the effectiveness of our approach on both multimodal sentiment regression and classification benchmarks.
Primary Subject Area: [Engagement] Emotional and Social Signals
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Our work utilizes text, audio, and visual data to analyze multimodal sentiment, and incorporates active learning to achieve data-efficiency. This advancement in tackling data scarcity significantly contributes to the field of multimodal sentiment analysis, enabling more efficient multimodal processing solutions.
Supplementary Material: zip
Submission Number: 5069
Loading