FaceMoE: Mixture of Experts for Low-Resolution Face Recognition

05 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Low-resolution Face Recognition
TL;DR: We provide a mixture of experts modification to the transformer backbone which results in SOTA performance.
Abstract: Low-resolution face recognition (LR-FR) remains a challenging task due to poor feature extraction and aggregation, as probe images often contain limited iden- tity information resulting from extreme degradations such as blur, occlusion, and low contrast. Additionally, the domain gap between high-resolution (HR) gallery images and low-resolution (LR) probe images poses a significant challenge. A single feature encoder struggles to generalize effectively across both domains when fine-tuned on an LR dataset, and this issue is further magnified by catastrophic forgetting. To address these challenges, we propose FaceMoE, a novel transformer- based architecture enhanced with a Mixture of Experts (MoE) design. Specifically, we introduce multiple specialized feed-forward network (FFN) experts and incor- porate a top-k router, which dynamically assigns tokens to appropriate experts. This design promotes specialization across experts for different semantic regions of the face, which enables FaceMoE to perform resolution-aware feature extraction. Moreover, the top-krouter facilitates sparse expert activation, enabling the model to preserve pretrained knowledge when finetuned on a LR dataset, while increasing model capacity without proportional computational overhead. FaceMoE is trained with a combined face recognition loss, router z-loss, and load balancing loss to ensure expert specialization and stable training. To the best of our knowledge, this is the first work leveraging MoE for LR-FR. Extensive experiments across eleven datasets, spanning HR, mixed-quality, and LR benchmarks, demonstrate that Face- MoE significantly outperforms state-of-the-art methods, excelling in low-resolution face recognition. Code and models will be made public.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2478
Loading