Keywords: Privacy, language models, synthetic data
TL;DR: We design specialized canaries to audit the privacy of synthetic text generated by LLMs
Abstract: How much information about training examples can be gleaned from synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we investigate the design of membership inference attacks (MIAs) that target data used to fine-tune pre-trained LLMs that are then used to synthesize data, particularly when the adversary does not have access to the fine-tuned LLM but only to a synthetic data corpus. We demonstrate that using canaries crafted to maximize their vulnerability to attacks that have access to the model are sub-optimal for auditing privacy risks when only synthetic data is released. This is because such out-of-distribution canaries have limited influence on the model’s output when prompted to generate useful, in-distribution synthetic data, thus significantly limiting their vulnerability to MIAs. To tackle this problem, we leverage the mechanics of auto-regressive models to design canaries that leave detectable traces in synthetic data. Our approach significantly enhances the power of MIAs, providing a better assessment of the privacy risks of releasing synthetic data generated by LLMs.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11003
Loading