Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Songning Lai; Jiakang Li; Guinan Guo; Xifeng Hu; Yulong Li; Yuan Tan; Zichen Song; Yutong Liu; Zhaoxia Ren; Chun Wang; Danmin Miao; Zhi Liu

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Songning Lai, Jiakang Li, Guinan Guo, Xifeng Hu, Yulong Li, Yuan Tan, Zichen Song, Yutong Liu, Zhaoxia Ren, Chun Wang, Danmin Miao, Zhi Liu

Published: 01 Jan 2024, Last Modified: 30 Sept 2024IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Designing an effective representation learning method for multimodal sentiment analysis is a critical research area. The primary challenge is capturing shared and private information within a comprehensive modal representation, especially when dealing with uniform multimodal labels and raw feature fusion.To overcome this challenge, we propose a novel deep modal shared information learning module that utilizes the covariance matrix to capture shared information across modalities. Additionally, we introduce a label generation module based on a self-supervised learning strategy to capture the private information specific to each modality. Our module can be easily integrated into multimodal tasks and offers flexibility by allowing parameter adjustment to control the information exchange relationship between modes, facilitating the learning of private or shared information as needed. To further enhance performance, we employ a multi-task learning strategy that enables the model to focus on modal differentiation during training. We provide a detailed formulation derivation and feasibility proof for the design of the deep modal shared information learning module.To evaluate our approach, we conduct extensive experiments on three common multimodal sentiment analysis benchmark datasets. The experimental results validate the reliability of our model, demonstrating its effectiveness in capturing nuanced information in multimodal sentiment analysis tasks.

Loading