Keywords: Differential Privacy, Privacy Auditing, Reconstruction Attacks, One Training Run, Synthetic Data
TL;DR: We propose a one-run ε-DP auditing scheme for genAI training algorithms
Abstract: Generative models are gaining traction in synthetic data generation but see limited industry adoption because of lack of standardized data utility, fidelity, and especially privacy metrics. In this paper, we focus on privacy and propose a practical $\epsilon$-differential privacy auditing technique focused on structured generative and foundation models that measures memorization via nearest-neighbor distances between real training data and generated synthetic samples. By independently selecting a small subset of training data for auditing, our method operates in a single training run and treats the generative pipeline as a black box. Our approach models synthetic samples as reconstruction attacks and yields significantly stronger lower bounds on privacy loss than traditional membership inference attacks. We test our technique on five tabular generative models and one foundation model, and show our method provides a robust baseline to evaluate privacy of generative models.
Submission Number: 26
Loading