Abstract: Existing approaches to Multimodal Aspect-Based Sentiment Analysis have drawbacks: (i) Aspect extraction and sentiment classification always exhibit loose connections, over-looking aspect correlations which leads to inaccurate analysis of indirectly described aspects. (ii) Image pixels are coarsely treated equally in most methods, introducing visual noise that compromise sentiment analysis accuracy. (iii) Additionally, most rely on extra pre-training image-text relation detection networks, limiting their generality. To address these issues, we propose the Joint modal Circular Complementary attention framework (JCC) which optimizes aspect extraction and sentiment classification jointly by incorporating global text to enhance the model's awareness of aspect correlations. JCC utilizes text for visual high-lighting to mitigate the impact of visual noise. Furthermore, we design the Circular Attention module (CIRA) for general feature-focused aspect extraction and the Modal Complementary Attention module (MCA) for detailed information-focused sentiment classification. Experimental results across three MABSA subtasks demonstrate the superiority of J0CC over existing methods.
Loading