A Relative Data Diversity Measure for Synthetic Face Images

Cancan Zhang, Chaitanya Roygaga, Aparna Bharati

Published: 01 Jan 2024, Last Modified: 17 Feb 2025IJCB 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Assessing the nature of synthetic images created by generative models is crucial for ensuring their usefulness for downstream visual recognition tasks. In addition to qualitative evaluation of realism, image generation processes are quantitatively assessed in terms of fidelity and diversity. Existing measures take into account proximity of generated data distribution with real training data, but lack understanding of key attributes characterizing diverse real datasets. In this work, we investigate the properties of generated synthetic face images with respect to the data used during training. We define a relative data diversity measure that captures both mutable and immutable aspects of face images and can represent gain/loss in diversity between two datasets. Through comprehensive experiments using GANs and DDPMs on two face image datasets — VGG-Face2 and BUPT-BalancedFace — we show that our data diversity measure captures variability among images in a more meaningful manner than existing metrics. Additionally, the proposed D2D measure can capture information leakage from training samples to generated images, highlighting privacy-related issues in the generation process. Since both privacy and diversity are desired properties for synthetic face image generation, the proposed measure, provides a more robust evaluation of generative methods than other existing measures.