Boosting Modality Representation With Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis
Abstract: Sentiment analysis has traditionally leveraged information from text data. More recently, it has become increasingly clear that multimodal data provides a rich space to drastically boost interpretation of human sentiments by harnessing information across multiple modalities. In this study, we incorporate pre-trained feature extractors and propose a multitask training strategy to improve modality representations for Multimodal Sentiment Analysis (MSA). The experimental results on the CH-SIMS v2 dataset demonstrate the superior performance of the proposed system compared to existing state-of-the-art methods, validating the effectiveness of our proposed approach. Furthermore, our framework reduces reliance on textual data, achieving competitive outcomes even when utilizing only auditory and visual modalities.
Loading