Keywords: Facial Emotion Recognition, Data-Centric Deep Learning, Model-Centric Optimization, Multi-Teacher Knowledge Distillation, Robust Emotion Recognition
TL;DR: We present a unified data- and model-centric framework for robust facial emotion recognition, combining dataset refinement, hybrid CNN–Transformer architecture, and multi-teacher knowledge distillation with adversarial training.
Abstract: Recent Progress in Deep Learning (DL) has shown that data quality constrains the generalization as much as model design. Facial Emotion Recognition (FER) exemplifies this challenge, as widely used datasets contain mislabeled, duplicated, class imbalanced, and visually affected samples that weaken both accuracy and robustness. In this paper we proposed a data-centric approach to FER, building a systematic pipeline that improves dataset reliability before model training. The pipeline includes (i) Noisy and duplicated samples removal, (ii) landmark-guided facial refinement, and (iii) class-aware re-balanced under-presented emotions in the dataset. Following the data-centric pipeline we proposed a lightweight hybrid CNN-Transformer student model with Emotion Aware Dynamic Distillation (EADD), where knowledge is adaptively distilled from multiple teacher networks depending on their emotion-specific strengths. Despite the multi-teacher knowledge distillation student model is further optimized by adversarial training to enhance its robustness against subtle perturbations in real-world FER. Extensive experiments on FER2013 and KDEF highlights that our approach achieved state-of-the-art robustness, efficiency and trade-offs for real-time FER on Edge devices. The results demonstrate that systematic data refinement is as critical as model innovation. The source code for results reproducibility of the paper is publicly available at (https://github.com/anonymous123810/ICLR2026).
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17329
Loading