DiversiNet: Mitigating Bias in Deep Classification Networks across Sensitive Attributes through Diffusion-Generated Data

Basudha Pal, Aniket Roy, Ram Prabhakar Kathirvel, Alice J. O'Toole, Rama Chellappa

Published: 01 Jan 2024, Last Modified: 18 Jul 2025IJCB 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning models trained on sensitive data often show biases towards certain demographics, posing fairness challenges, especially with limited datasets. Diffusion generated data effectively supplement the underrepresented dataset, serving as a regularization technique to enhance feature learning. In addition to the original balanced dataset, we incorporate synthetic data generated by the diffusion model to train classifiers and subsequently assess their performance. Experimental results demonstrate a reduction in bias across all target attributes along with an increase in overall accuracy. For instance, for gender classification in the FFHQ dataset, the overall accuracy rises to 94.44% from 93.92% after including data generated from a diffusion model. Simultaneously, the bias, measured as the absolute difference between the true positive rates of young and old individuals, decreases from 0.0340 to 0.0204 (reduction of 40%). Moreover, we extend our analysis to multi-attribute scenarios, successfully mitigating bias with respect to multiple sensitive attributes simultaneously in sensitive attribute classification as well as in other downstream tasks. To the best of our knowledge, this study introduces a novel approach to bias mitigation, highlighting the versatility of diffusion-based data augmentation in addressing biases concerning age, gender, and race.