MASANet: Multi-Aspect Semantic Auxiliary Network for Visual Sentiment Analysis

Jinglun Cen, Chunmei Qing, Haochun Ou, Xiangmin Xu, Junpeng Tan

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Affect. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, multi-modal affective computing has demonstrated that introducing multi-modal information can enhance performance. However, multi-modal research faces significant challenges due to its high requirements regarding data acquisition, modal integrity, and feature alignment. The widespread use of multi-modal pre-training methods offers the possibility of aiding visual sentiment analysis by introducing cross-domain knowledge. This paper proposes a Multi-Aspect Semantic Auxiliary Network (MASANet) for visual sentiment analysis. Specifically, MASANet achieves modality expansion through cross-modal generation, making it possible to introduce cross-domain semantic assistance. Then, a cross-modal gating module and an adaptive modal fusion module are proposed for aspect-level and cross-modal interaction, respectively. In addition, a designed semantic polarity constraint loss is presented to improve sentiment multi-classification performance. Evaluations of eight widely-used affective image datasets demonstrate that our proposed method outperforms the state-of-the-art methods. Further ablation experiments and visualization results also confirm the effectiveness of the proposed method and its modules.