Privacy Auditing of Large Language Models

Published: 28 Jun 2024, Last Modified: 04 Aug 2024NextGenAISafety 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: differential privacy, privacy auditing
Abstract: An important research question is better understanding the privacy leakage of LLMs. The most practical and common way we have to understand privacy leakage is through a privacy audit. The first step in a successful privacy audit is a good membership inference attack. A major challenge in privacy auditing language models (LLMs) is the development of effective membership inference attacks. Current methods rely on basic approaches to generate canaries, which may not be optimal for measuring privacy leakage and underestimate the privacy leakage. In this work, we introduce a novel method to generate more effective canaries for membership inference attacks on LLMs. We demonstrate through experiments on fine-tuned LLMs that our approach can significantly improve the detection of privacy leakage compared to existing methods. For non-privately trained LLMs, our attack achieves $64.2%$ TPR at $0.01%$ FPR, largely surpassing previous attack that achieves $36.8%$ TPR at $0.01%$ FPR. Our method can be used to provide a privacy audit of $\varepsilon \approx 1$ for a model trained with theoretical $\varepsilon$ of 4. To the best of our knowledge, this is the first time that a privacy audit of LLM training has achieved nontrivial auditing success in the setting where the attacker cannot train shadow models, insert gradient canaries, or access the model at every iteration.
Submission Number: 147
Loading