Keywords: Multimodal learning; Multimodal Sentiment Analysis; Mixture-of-Experts; Robustness
Abstract: Multimodal sentiment models are typically developed and evaluated on curated benchmarks where text, audio, and vision are well-aligned and reliable. In deployment, modality quality varies across instances (e.g., noise, missing sensors), and modalities can disagree, making fixed fusion strategies brittle. We present $\textbf{RIDER-MoE}$, a mixture-of-experts architecture that routes each example among modality-uniqueness, redundancy, and synergy experts based on estimated modality reliability and cross-modal agreement. The expert decomposition is motivated by Partial Information Decomposition (PID), and we operationalize the intended U/R/S semantics via masked-view Disentangled Interaction Regularization during training. The router augments the fused representation with lightweight unimodal sentiment probes: high probe entropy and low consensus down-weight synergistic fusion and shift probability mass toward redundancy or unimodal experts. On CMU-MOSI and CMU-MOSEI, RIDER-MoE is competitive with recent strong baselines on clean test sets and achieves the best robustness (highest normalized AUC) across noise, missing-modality, and cross-modal conflict stress tests. These results support reliability-aware expert routing as a practical mechanism for robust multimodal sentiment analysis.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodal learning; Sentiment Analysis; Mixture-of-Experts
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 1396
Loading