KEBR: Knowledge Enhanced Self-Supervised Balanced Representation for Multimodal Sentiment Analysis

Aoqiang Zhu; Min Hu; Xiaohua Wang; Jiaoyun Yang; Yiming Tang; Fuji Ren

KEBR: Knowledge Enhanced Self-Supervised Balanced Representation for Multimodal Sentiment Analysis

Aoqiang Zhu, Min Hu, Xiaohua Wang, Jiaoyun Yang, Yiming Tang, Fuji Ren

Published: 01 Jan 2024, Last Modified: 20 Feb 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multimodal sentiment analysis (MSA) aims to integrate multiple modalities of information to better understand human sentiment. The current research mainly focuses on conducting multimodal fusion, which neglects the under-optimized modal representations generated by the imbalance of unimodal performances in joint learning. Moreover, the size of labeled datasets limits the generalization ability of existing supervised models. To address the above issues, this paper proposes a knowledge-enhanced self-supervised balanced representation approach (KEBR). First, a text-based cross-modal fusion method (TCMF) is constructed, which injects the non-verbal information from the videos into the semantic representation of text to enhance the multimodal representation of text. Then, a multimodal cosine constrained loss (MCC) is designed to constrain the fusion of non-verbal information in joint learning to balance the representation. Finally, with the help of sentiment knowledge and non-verbal information, KEBR conducts sentiment word masking and sentiment intensity prediction. Experimental results show that KEBR outperforms the baseline.

Loading