Abstract: Multimodal sentiment analysis, which has garnered widespread attention in recent years, aims to predict human emotional states using multimodal data. Previous studies have primarily focused on enhancing multimodal fusion and integrating information across different modalities while overlooking the impact of noisy data on the internal features of each single modality. In this paper, we propose the Enhanced experts with Uncertainty-Aware Routing (EUAR) method to address the influence of noisy data on multimodal sentiment analysis by capturing uncertainty and dynamically altering the network. Specifically, we introduce the Mixture of Experts approach into multimodal sentiment analysis for the first time, leveraging its properties under conditional computation to dynamically alter the network in response to different types of noisy data. Particularly, we refine the experts within the MoE framework to capture uncertainty in the data and extract clearer features. Additionally, a novel routing mechanism is introduced. Through our proposed U-loss, which utilizes the quantified uncertainty by experts, the network learns to route different samples to experts with lower uncertainty for processing, thus obtaining clearer, noise-free features. Experimental results demonstrate that our method achieves state-of-the-art performance on three widely used multimodal sentiment analysis datasets. Moreover, experiments on noisy datasets show that our approach outperforms existing methods in handling noisy data. Our anonymous implementation code can be available at https://anonymous.4open.science/r/EUAR-7BF6.
Primary Subject Area: [Engagement] Emotional and Social Signals
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: Multimodal Sentiment Analysis (MSA) task aims at teaching computers to understand humans' sentiments with text, audio, and vision modalities. It is a typical multimodal processing task and well-matched the scope of the ACM Multimedia Conference. In this paper, we propose a novel MSA method, which takes the first step to combine the Mixture of Experts (MoE) structure into the MSA challenge. Comprehensive experiments demonstrate the effectiveness of our proposed method. All contributions in this paper focus on cutting-edge techniques in multimedia emotional analysis, and well match the scope of the ACM Multimedia Conference.
Supplementary Material: zip
Submission Number: 1941
Loading