Abstract: Facial Expression Recognition (FER) shows promising applicability in various real-world contexts, including criminal investigations and digital entertainment. Existing cross-domain FER methods primarily focus on spatial domain features sensitive to noise. However, these methods may propagate noise from the source domain to unseen target domains, degrading recognition performance. To address this, we propose a Noise-Robust and Generalizable framework for FER (NR-GFER), mainly comprising Residual Adapter (RA), Fourier Prompt (FP) modules, and a cross-stage unified fusion mechanism. Specifically, the RA module flexibly transfers the generalization ability of a visual-language large model to FER. Leveraging the residual mechanism improves the discriminative ability of spatial domain features. However, the domain gap may lead FER models to capture source domain-specific noise, which adversely affects performance on target domains. To mitigate this, the FP module extracts frequency domain features via the Fourier transform, integrates them with prompts, and reconstructs them back to the spatial domain through the inverse Fourier transform, thus reducing the negative impact of noise from the source domain. Finally, the cross-stage unified fusion mechanism that bridges intra-module and inter-module semantic priorities, simplifying hyperparameter optimization. Comprehensive evaluations across seven in-the-wild FER datasets confirm that our NR-GFER achieves state-of-the-art performance.
External IDs:doi:10.1016/j.ins.2025.122457
Loading