Impact of Optimizing Synthetic Image Similarity on Downstream Task Augmentation

NLDL 2026 Conference Submission11 Authors

29 Aug 2025 (modified: 05 Nov 2025)Submitted to NLDL 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning, Medical Imaging, Generative AI
TL;DR: We present three novel algorithms which can improve the similarity of synthetic data to its training data, but this improvement has no impact on the synthetic data's effectiveness for augmenting downstream tasks.
Abstract: Using generative machine learning to generate synthetic medical data is an increasingly common method of augmenting limited datasets in segmentation and classification tasks. Typically, the quality of the data is measured by its similarity to the training data, as measured by the Frechet Inception Distance (FID). In this paper we present three synthetic image selection algorithms that can be applied to GAN and Diffusion models after training with the aim of improving the quality of synthetic dataset and the downstream augmentation effectiveness. Our study shows that while the algorithms can consistently improve the FID significantly (up to a 27.38\% reduction) in GAN generations, the results are mixed for diffusion models. Additionally, this improvement in FID has no significant impact on the downstream augmentation effectiveness of either model. This suggests that optimising the FID is not a good method for improving the augmentation efficacy of synthetic data.
Serve As Reviewer: ~Thomas_Wallace1
Submission Number: 11
Loading