On Which Data Distribution (Synthetic or Real) We Should Rely for Soft Biometric Classification

Manju R. A, Atul Kumar, Akshay Agarwal

Published: 2025, Last Modified: 07 Nov 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Identification of gender is critical not only for human-computer interaction but also for scrutinizing the search space in which an identity needs to be determined. Traditionally, “real” facial images are employed for gender identification by computer vision algorithms. Due to the tremendous rise of privacy and advancement in generative networks, synthetic face images are heavily developed and can be used for several face-related studies including gender classification. However, their effectiveness compared to real images is still unexplored for gender classification. In response, this study explores the effectiveness of gender classification networks trained on real and synthetic face images, offering novel insights into the effectiveness of these two data distributions. For that, we implemented several state-of-the-art gender classification architectures covering convolutional neural networks (CNNs) and vision transformers (ViT). Our research builds on the rigorous evaluation of 8 Deep Neural Networks (DNNs) across 4 diverse datasets and 6 types of image corruptions. To make the research interpretable, we have also used several explainable mechanisms, including Grad-CAM and t-SNE visualizations. In brief, the impact of the proposed research is multifold: (i) understand the effectiveness of real vs. synthetic data distributions in network training and (ii) whether the synthetic models reflect the true physical world distribution to ensure that the models trained on them are resilient against image perturbations.

External IDs:dblp:conf/wacv/AK025