Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification

Published: 06 May 2025, Last Modified: 06 May 2025SynData4CVEveryoneRevisionsBibTeXCC BY 4.0
Keywords: bias mitigation, diffusion based data augmentations, fairness
Abstract: Image classification systems inherit biases from uneven group representation, e.g., blond hair disproportionately associated with females in face datasets, reinforcing stereotypes. A recent approach leverages the Stable Diffusion model to generate balanced training data, but these models often struggle to preserve the original data distribution. In this work, we explore multiple diffusion fine-tuning techniques, e.g., LoRA and DreamBooth, to generate images that more accurately represent specific training groups by learning directly from their samples. We propose Clustered DreamBooth, clustering group images and training separate models for clusters to handle intra-group diversity. Using these models, we generate images uniformly across groups to pre-train a classification model, followed by fine-tuning on real data. Experiments on multiple benchmarks demonstrate that the studied fine-tuning approaches, especially Clustered DreamBooth, outperform vanilla Stable Diffusion on average and achieve results comparable to state-of-the-art debiasing techniques like Group-DRO, while surpassing them as the dataset bias severity increases.
Submission Number: 26
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview