Can Diffusion Models Generalize? Privacy and Fairness Trade-offs for Medical Data Sharing.

Mischa Dombrowski; Bernhard Kainz

Can Diffusion Models Generalize? Privacy and Fairness Trade-offs for Medical Data Sharing.

Mischa Dombrowski, Bernhard Kainz

Published: 27 Mar 2025, Last Modified: 11 Jul 2025MIDL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Privacy, Fairness, Image Generation

TL;DR: Use prior knowledge to quanitfy fairness and privacy in diffusion models

Abstract: The recent surge in options for diffusion model-based synthetic data sharing offers significant benefits for medical research, provided privacy and fairness concerns are addressed. Generative models risk memorizing sensitive training samples, potentially exposing identifiable information. Simultaneously, underrepresented features -- such as rare diseases, uncommon medical devices, or infrequent patient ethnicities -- are often not learned well, creating unfair biases in downstream applications. Our work unifies these challenges by leveraging artificially generated fingerprints (SAFs) in the training data as a controllable test for memorization and fairness. Specifically, we measure whether a diffusion model reproduces these fingerprints verbatim (a privacy breach) or ignores them entirely (a fairness violation) and introduce an indicator t' to quantify finished models for the likelihood of reproducing training samples. Extensive experiments on real and synthetic medical imaging datasets reveal that na\"ive diffusion model training can lead to privacy leaks or unfair coverage. By systematically incorporating SAFs and monitoring t', we demonstrate how to balance privacy and fairness objectives. Our evaluation framework provides actionable guidance for designing generative models that preserve patient anonymity without excluding underrepresented patient subgroups. Code is available at https://github.com/MischaD/Privacy.

Primary Subject Area: Image Synthesis

Secondary Subject Area: Fairness and Bias

Paper Type: Both

Registration Requirement: Yes

Reproducibility: https://github.com/MischaD/Privacy

Visa & Travel: Yes

Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Created a single midl25_NNN.zip file with midl25_NNN.tex, midl25_NNN.bib, all necessary figures and files., Includes \documentclass{midl}, \jmlryear{2025}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package, Did not use the times package., Author and institution details are de-anonymized where needed. All author names, affiliations, and paper title are correctly spelled and capitalized in the biography section., References must use the .bib file. Did not override the bibliographystyle defined in midl.cls. Did not use \begin{thebibliography} directly to insert references., Tables and figures do not overflow margins; avoid using \scalebox; used \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., Appendices and supplementary material are included in the same PDF after references., Main paper does not exceed 9 pages; acknowledgements, references, and appendix start on page 10 or later.

Latex Code: zip

Copyright Form: pdf

Submission Number: 25

Loading