everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
How much information about training examples can be gleaned from synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we investigate the design of membership inference attacks (MIAs) that target data used to fine-tune pre-trained LLMs that are then used to synthesize data, particularly when the adversary does not have access to the fine-tuned LLM but only to a synthetic data corpus. We demonstrate that using canaries crafted to maximize their vulnerability to attacks that have access to the model are sub-optimal for auditing privacy risks when only synthetic data is released. This is because such out-of-distribution canaries have limited influence on the model’s output when prompted to generate useful, in-distribution synthetic data, thus significantly limiting their vulnerability to MIAs. To tackle this problem, we leverage the mechanics of auto-regressive models to design canaries that leave detectable traces in synthetic data. Our approach significantly enhances the power of MIAs, providing a better assessment of the privacy risks of releasing synthetic data generated by LLMs.