Keywords: Privacy, Data-sharing, Conditional Image Generation, Diffusion Models
TL;DR: We frame image generation in a singling-out framework and show how image generation methods can be used to mitigate privacy issues in data sharing
Abstract: Synthetic data has recently reached a level of visual fidelity
that makes it nearly indistinguishable from real data, offering great
promise for privacy-preserving data sharing in medical imaging. However, fully synthetic datasets still suffer from significant limitations: First
and foremost, the legal aspect of sharing synthetic data is often neglected
and data regulations, such as the GDPR, are largley ignored. Secondly,
synthetic models fall short of matching the performance of real data,
even for in-domain downstream applications. Recent methods for image
generation have focused on maximising image diversity instead of fidelity
solely to improve the mode coverage and therefore the downstream performance of synthetic data. In this work, we shift perspective and highlight how maximizing diversity can also be interpreted as protecting natural persons from being singled out, which leads to predicate singling-out
(PSO) secure synthetic datasets. Specifically, we propose a generalisable
framework for training diffusion models on personal data which leads to
unpersonal synthetic datasets achieving performance within one percentage point of real-data models while significantly outperforming state-of-the-art methods that do not ensure privacy. Our code is available at
https://anonymous.4open.science/r/Trichotomy-C02B.
Submission Number: 1
Loading