Aspect-Aware Image Descriptions for Multimodal Aspect-Based Sentiment Analysis: A Unified Framework with Dual Similarity and Confidence Calibration

ACL ARR 2025 May Submission7297 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal Aspect-Based Sentiment Analysis (MABSA) involves identifying textual aspects, aligning them with visual evidence, and analyzing their sentiment. Existing approaches often suffer from error propagation and inefficient cross-modal reasoning. To address these challenges, we propose MADSC (Multimodal Aspect-aware Description with Similarity and Calibration) and a unified framework that jointly performs Multimodal Aspect Term EXtraction (MATE), MABSA, and Joint Multimodal Aspect Sentiment Analysis (JMASA) in an end-to-end manner. Firstly, MADSC generates aspect-aware image descriptions by replacing the generic object mentions with textual aspects, bridging the semantic gap between modalities. Second, a dual similarity alignment strategy is proposed to combine textual-object and visual-region alignments using bounding boxes as intermediaries. A confidence calibration mechanism is developed to quantify the uncertainty of alignment, while a modality gating mechanism suppresses irrelevant visual features for absent aspects, ensuring robust predictions. Experiments on benchmark datasets show that MADSC outperforms a wide range of state-of-the-art methods on MATE, MABSA and JMASA tasks.
Paper Type: Long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: Multimdoal Aspect-based Sentiment Analysis, Multimodal Named Entity Recognition, Large Language Models
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7297
Loading