Towards Fairness in Machine Learning: Balancing Racially Imbalanced Datasets Through Data Augmentation and Generative AI

Anthonie Schaap, Sofoklis Kitharidis, Niki van Stein

Published: 01 Jan 2024, Last Modified: 05 Feb 2025IJCCI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing AI models trained on facial images are often heavily biased towards certain ethnic groups due to training data containing unrealistic ethnicity splits. This study examines ethnic biases in facial recognition AI models, resulting from skewed dataset representations. Various data augmentation and generative AI techniques were evaluated to mitigate these biases, employing fairness metrics to measure improvements. Our methodology included balancing training datasets with synthetic data generated through Generative Adversarial Networks (GANs), targeting underrepresented ethnic groups. Experimental results indicate that these interventions effectively reduce bias, enhancing the fairness of AI models across different ethnicities. This research contributes practical approaches for adjusting dataset imbalances in AI systems, ultimately improving the reliability and ethical deployment of facial recognition technologies.