Semantic-Guided Progressive Multimodal Sentiment Learning with Prompt Interaction

Semantic-Guided Progressive Multimodal Sentiment Learning with Prompt Interaction

ACL ARR 2026 January Submission3228 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Sentiment Analysis, Prompt Learning, Vision-Language Models, Contrastive Learning

Abstract: Multimodal sentiment analysis aims to infer affective states from image–text pairs in social media. Most existing approaches rely on single-step fusion or static representations, treating affective cues as fixed and non-progressive representations. Meanwhile, prompt-based methods typically initialize prompts with sentiment-irrelevant text or random vectors, or inject auxiliary semantics in a one-step manner, failing to explicitly guide semantic evolution. To address these limitations, we propose a semantic-guided progressive framework with stage-wise prompt interaction (SPRO), which organizes multimodal supervision along a cognitively inspired trajectory from Tone to Emotion. Specifically, emotion understanding is decomposed into three successive stages—Tone, Content, and Emotion—corresponding to perceptual appraisal, semantic grounding, and affective reasoning. At each stage, LLM-generated structured captions provide explicit semantic guidance, while learnable multimodal prompts serve as a shared affective interface to progressively align visual and textual representations within a unified semantic space. Furthermore, a dual-path contrastive alignment strategy jointly optimizes image–category and text–category consistency, reinforcing cross-modal semantic consistency. Experiments demonstrate that SPR achieves superior accuracy and interpretability over state-of-the-art methods. The source code is publicly available.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: Sentiment Analysis, multimodality, image text matching

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 3228

Loading