Group-aware Multiscale Ensemble Learning for Test-Time Multimodal Sentiment Analysis

Kai Tang, Yixuan Tang, Tianyi Chen, Haokai Xu, Qiqi Luo, Jin Guang Zheng, Zhixin Zhang, Gang Chen, Haobo Wang

Published: 20 Jan 2026, Last Modified: 27 Jan 2026AAAI 2026EveryoneCC BY 4.0

Abstract: Multi-modal Sentiment Analysis (MSA) enables machines to perceive human sentiments by integrating multiple modalities such as text, video, and audio. Despite recent progress, most existing methods assume distribution consistency between training and test data—a condition rarely met in real-world scenarios. To address domain shifts without relying on source data or target labels, Test-Time Adaptation (TTA) has emerged as a promising paradigm. However, applying TTA methods to MSA faces two challenges: a representation bottleneck inherent to the regression formulation and the inconsistency in modality fusion caused by modality-specific data augmentation techniques. To overcome these issues, we propose Group-aware Multiscale Ensemble Learning (GMEL), which leverages a von Mises-Fisher (vMF) mixture distribution to model latent sentiment groups and integrates a multi-scale re-dropout strategy for modality-agnostic feature augmentation, preserving fusion consistency. Extensive experiments on three benchmark datasets using two backbone architectures show that GMEL significantly outperforms existing baselines, demonstrating strong robustness to test-time distribution shifts in multi-modal sentiment analysis.