Efficient Face Recognition via Representative Synthetic Data

Ching-Hsun Chang, Ming-Hao Lee, Yu-Hsuan Chiu, Yi-Min Liao, Sheng-Luen Chung, Gee-Sern Hsu

Published: 2025, Last Modified: 28 Feb 2026ICMEW 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: While face recognition has progressed with better architectures, loss functions, and datasets, issues like label noise, privacy, and imbalance remain. Synthetic datasets address these challenges by allowing controlled attribute manipulation and reducing both data collection costs and privacy risks. We propose a two-phase approach to enhance face recognition with synthetic data. Phase 1 builds a base model using datasets from DCFace and CemiFace, while simulating the intra-class distribution of CASIA-WebFace to achieve competitive performance with just 0.16M images, versus 0.55M for DCFace and CemiFace. Phase 2 leverages the Source Face Generator (SFG) and Style-Transfer Diffusion (STD) to generate high-quality synthetic faces. Find Representative Samples (FRS) scheme is then proposed, using the base model to select representative samples from the synthetic data produced by SFG and STD for training. Our method efficiently reduces data requirements and computational costs while achieving state-of-the-art face recognition performance.

External IDs:dblp:conf/icmcs/ChangLCLCH25