SynthFair: A Semi-Synthetic Medical Imaging Dataset to Propel Research on Bias Detection & Mitigation

Fabio De Sousa Ribeiro; Estanislao Claucich; Emma A.M. Stanley; Panos Dimitrakopoulos; Sotirios A. Tsaftaris; Enzo Ferrante; Ben Glocker; Rodrigo Echeveste

SynthFair: A Semi-Synthetic Medical Imaging Dataset to Propel Research on Bias Detection & Mitigation

Fabio De Sousa Ribeiro, Estanislao Claucich, Emma A.M. Stanley, Panos Dimitrakopoulos, Sotirios A. Tsaftaris, Enzo Ferrante, Ben Glocker, Rodrigo Echeveste

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 2: Dataset Proposal Competition

Keywords: algorithmic fairness, bias mitigation, medicine, generative AI, counterfactuals

TL;DR: SynthFair proposes a massive semi-synthetic dataset of medical images using cutting-edge generative AI to enable urgently needed research into biases related to protected attributes and spurious correlations, as well as strategies to mitigate them.

Abstract: The scarcity of large-scale datasets capable of capturing the rich diversity of the global population is currently a major limitation for the development of equitable AI tools in the medical domain. Underrepresentation of certain subpopulations renders the evaluation of bias audits and subsequent mitigation difficult in general, and practically unfeasible when it comes to intersectional studies. Moreover, spurious correlations in these datasets, which are challenging to identify, have a tendency to result in shortcut learning whereby models base their decisions on features unrelated to the task, which may lead to catastrophic failure at test time. Current fairness benchmarks are not representative of real-world data, making it difficult to draw conclusions that are relevant for clinical practice. SynthFair aims to bridge this gap by leveraging cutting-edge technology in generative AI. The use of GenAI will allow us to create a massive semi-synthetic dataset of chest x-ray images, augmenting a rich international collection of databases by means of counterfactual image generation. SynthFair is the result of an international collaboration with a proven track record in synthetic image creation, database curation, as well as in bias detection and mitigation in the context of medical imaging.

Submission Number: 259

Loading