HiTNet: Hippocampal-Thalamic Inspired Dual-Stream Network for Multimodal Sentiment Analysis under Missing Data

Yujuan Zhang; Qing Li; Xiuxing Li; Zhuo Wang; Ziyu Li; Xia Wu

HiTNet: Hippocampal-Thalamic Inspired Dual-Stream Network for Multimodal Sentiment Analysis under Missing Data

Yujuan Zhang, Qing Li, Xiuxing Li, Zhuo Wang, Ziyu Li, Xia Wu

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Sentiment Analysis, Data Missing, Hippocampal-Thalamic Inspired, Dual-Stream Network

Abstract: Multimodal sentiment analysis faces significant challenges under conditions of missing data, where simultaneous random frame-level missing across all modalities results in fragmented emotional cues and heterogeneous data quality. Existing methods predominantly rely on cross-modal consistency for completion but often neglect residual intra-modal information and lack in assessing cross-modal reliability, leading to redundancy that degrades performance. Human cognitive systems exhibit remarkable robustness to incomplete perceptual input through two functional mechanisms: hippocampal memory systems that reconstruct missing content via pattern completion from stored semantic traces, and thalamic perceptual regulation that dynamically integrates multisensory inputs while filtering unreliable information. Inspired by the brain functions, we propose a Hippocampal-Thalamic dual-stream Network (HiTNet). Hippocampal-inspired intra-modal enhancement stream employs semantic memory modules with dynamic retrieval and sparse activation networks to mine modality-specific information and reconstruct missing features. Thalamic-inspired inter-modal regulation stream implements confidence perception and adaptive cross-modal completion modules to dynamically integrate high-quality cross-modal information while suppressing redundant interference. Comprehensive experiments on MOSI, MOSEI, and SIMS demonstrate that HiTNet achieves superior performance with 1.5%–2.0% average accuracy improvements over state-of-the-art methods across all missing rates and maintains 72.20% accuracy under extreme 90% missing conditions on MOSEI, validating the effectiveness of brain function-inspired design for robust multimodal sentiment analysis even under extreme missing data scenarios. Our code is available at: https://anonymous.4open.science/r/HiTNet-8798/.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 22813

Loading