Label-Guided Enhancement: A Distillation Framework for Uncertainty-Aware Multimodal Emotion Recognition
Abstract: Multimodal Emotion Recognition (MER) is a vital technology for capturing nuanced human emotions by integrating complementary textual, acoustic, and visual cues. However, real-world MER systems frequently encounter issues with missing or conflicting modalities, arising from sensor failures, privacy constraints, or contradictory emotional signals. These issues compromise the efficacy of existing attention-based fusion models. In this paper, we propose CUMDF, a Counterfactual based Uncertain Missing Modality Distillation Framework that addresses these challenges through three core innovations. Specially, we introduce a Label Guided Multimodal Masked Transformer (LG MMT) to align features with target sentiment semantics and improving robustness under incomplete or conflicting data. Furthermore, we design the Adaptive and Generalized Knowledge Extractors to disentangle modality specific information from shared cross modal patterns, enhancing representational diversity and coherence. Finally, we design a Modality Attribution based Counterfactual Inference (MACI) mechanism that quantifies each modality’s causal contribution via counterfactual predictions and dynamically adjusts distillation weights to focus the student model on under optimized modalities. Experimental results on three benchmark datasets demonstrate that CUMDF outperforms state of the art approaches, highlighting the importance of uncertainty modeling in MER.
Paper Type: Long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: cross-modal information extraction, automatic speech recognition
Languages Studied: Python
Submission Number: 275
Loading