Keywords: Rényi Differential Privacy, Reconstruction Attacks, Information Theory
TL;DR: Quantify the information leakage using better reconstruction bounds backed by experimental testing
Abstract: Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. It has been recently shown that simple heuristics can reconstruct data samples from language models, making this threat scenario an important aspect of model release. Differential privacy is a known solution to such attacks, but is often used with a large privacy budget (epsilon > 8) which does not translate to meaningful guarantees. In this paper we show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature. In particular, we show that larger privacy budgets do not provably protect against membership inference, but can still protect extraction of rare secrets. We design a method to efficiently run reconstruction attacks with lazy sampling and empirically show that we can surface at-risk training samples from non-private language models. We show experimentally that our guarantees hold on real-life language models trained with differential privacy for difficult scenarios, including GPT-2 finetuned on Wikitext-103.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/defending-against-reconstruction-attacks/code)
9 Replies
Loading