Dynamic Semantic Routing for Multimodal Sentiment Analysis

20 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Sentiment Analysis, Representation Learning, Semantic Factorization
Abstract: Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating heterogeneous signals such as language, vision, and acoustic modalities. However, multimodal data often suffer from internal semantic entanglement, ambiguous cues, and inconsistent modality contributions, which limit the effectiveness of unified representations. To address these challenges, we propose a Dynamic Semantic Routing Framework (DSRF) for the MSA task. Specifically, we present a hierarchical semantic factorization module, which disentangles each modality into four functionally independent representations: primary emotion, contextual cue, ambiguity, and noise, enabling fine-grained semantic modeling. Moreover, we introduce a semantic dynamic routing interaction mechanism, which dynamically routes and aggregates the semantic factors through a capsule-inspired interaction process to reconstruct modality representations with high-order compositionality. Finally, we design an uncertainty-aware semantic fusion strategy that estimates the reliability of each semantic factor and adaptively integrates them across modalities for robust sentiment prediction under modality inconsistency. Extensive experiments on four benchmark datasets demonstrate that our framework achieves state-of-the-art performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 23770
Loading