Mind the Privacy Budget: How Generative Models Spend their Privacy BudgetsDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: synthetic data, differential privacy, generative models, graphical models, GANs
TL;DR: We analyze the specific steps in which different DP generative approaches ``spend'' their privacy budget and evaluate the effects on downstream tasks performance with increasingly wider and taller training datasets.
Abstract: Numerous Differentially Private (DP) generative models have been presented that aim to produce synthetic data while minimizing privacy risks. As there is no single model that works well in all settings, empirical analysis is needed to establish and optimize trade-offs vis-\`a-vis the intended use of the synthetic data. In this paper, we identify and address several challenges in the empirical evaluation of such models. First, we analyze the steps in which different algorithms ``spend'' their privacy budget. We evaluate the effects on the performance of downstream tasks to identify problem settings they are most likely to be successful at. Then, we experiment with increasingly wider and taller training sets with various features, decreasing privacy budgets, and different DP mechanisms and generative models. Our empirical evaluation, performed on both graphical and deep generative models, sheds light on the distinctive features of different models/mechanisms that make them well-suited for different settings and tasks. Graphical models distribute the privacy budget horizontally and cannot handle relatively wide datasets, while the performance on the task they were optimized for monotonically increases with more data. Deep generative models spend their budget per iteration, and their behavior is less predictable with varying dataset dimensions, but could perform better if trained on more features. Also, low levels of privacy ($\epsilon\geq100$) could help some models generalize, achieving better results than without applying DP.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
7 Replies

Loading