EaNet: Enhanced Multimodal Awareness Alignment Network for Multimodal Aspect-Based Sentiment Analysis

Aoqiang Zhu, Min Hu, Xiaohua Wang, Yan Xing, Yiming Tang, Jiaoyun Yang, Ning An, Fuji Ren

Published: 01 Jan 2025, Last Modified: 12 Nov 2025IEEE Transactions on Affective ComputingEveryoneRevisionsCC BY-SA 4.0
Abstract: Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to identify aspect-sentiment pairs from both text and images. Although notable progress has been made in aligning aspects with visual content, the implicit and subtle nature of language often leads to the absence of explicit aspect terms, making alignment challenging. Existing methods typically adopt a coarse strategy that aligns the entire image with aspect, introducing noise from irrelevant or overlapping regions. Furthermore, different image regions may correspond to different textual aspects, causing sentiment signals to interfere with each other. To tackle these issues, we propose the Enhanced Multimodal Awareness Alignment Network (EaNet), which enables fine-grained aspect-region alignment while mitigating cross-modal interference. EaNet first uses a modality-adaptive encoder to preserve intra-modal features and suppress irrelevant signals, then applies applies aspect-aware and sentiment-aware modules to jointly improve alignment and denoising. To further improve the model's understanding of multimodal sentiment patterns and aspect-opinion semantics, we design four targeted pre-training tasks. In particular, to address implicit aspect scenarios arising from concise textual expressions, we introduce a large language model-guided module for implicit aspect-opinion generation. Experiments on three MABSA subtasks show that EaNet achieves state-of-the-art performance.
Loading