agenticMSA: Agentic Multimodal Sentiment Analysis with Task-Specific and Large Language Model Collaboration
Abstract: Multimodal Sentiment Analysis (MSA) faces challenges due to inconsistencies between modalities, such as conflicting sentiment cues from visual, audio, and text data. These modality conflicts make it difficult for previous task-specific samll-scale models to accurately predict sentiment. Although general large multimodal language models (MLLMs) perform well on conflict/hard samples, they can occasionally make errors on simpler samples due to problems like hallucinations or excessive reasoning. To address these issues, we propose agenticMSA, an agentic framework that integrates the strengths of conventional task-specific models and general MLLMs through planning, decision, and reflection agents. The agenticMSA introduces a Modality Conflict Detection (MCD) that identifies modality conflicts, allowing the framework to arrange simpler samples to task-specific models for efficient predictions. For modality conflict samples, we introduce two key modules: 1) Hybrid Collaboration (HC), where decision agents powered by both a task-specific model and a MLLM collaborate to resolve discrepancies. 2) Group Discussion (GD), where multiple MLLM-based decision agents discuss divergent predictions, guided by a reflection agent to reach a consensus. Extensive experiments demonstrate the effectiveness of agenticMSA, achieving state-of-the-art performance on two popular datasets such as CH-SIMS and CMU-MOSI.
Paper Type: Long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: Multimodal Sentiment Analysis
Contribution Types: NLP engineering experiment
Languages Studied: English,Chinese
Submission Number: 631
Loading