Label-Guided Enhancement: A Distillation Framework for Uncertainty-Aware Multimodal Emotion Recognition

Label-Guided Enhancement: A Distillation Framework for Uncertainty-Aware Multimodal Emotion Recognition

ACL ARR 2025 May Submission275 Authors

10 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multimodal Emotion Recognition (MER) is a vital technology for capturing nuanced human emotions by integrating complementary textual, acoustic, and visual cues. However, real-world MER systems frequently encounter issues with missing or conflicting modalities, arising from sensor failures, privacy constraints, or contradictory emotional signals. These issues compromise the efficacy of existing attention-based fusion models. In this paper, we propose CUMDF, a Counterfactual based Uncertain Missing Modality Distillation Framework that addresses these challenges through three core innovations. Specially, we introduce a Label Guided Multimodal Masked Transformer (LG MMT) to align features with target sentiment semantics and improving robustness under incomplete or conflicting data. Furthermore, we design the Adaptive and Generalized Knowledge Extractors to disentangle modality specific information from shared cross modal patterns, enhancing representational diversity and coherence. Finally, we design a Modality Attribution based Counterfactual Inference (MACI) mechanism that quantifies each modality’s causal contribution via counterfactual predictions and dynamically adjusts distillation weights to focus the student model on under optimized modalities. Experimental results on three benchmark datasets demonstrate that CUMDF outperforms state of the art approaches, highlighting the importance of uncertainty modeling in MER.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: cross-modal information extraction, automatic speech recognition

Languages Studied: Python

Submission Number: 275

Loading