Auditing Empirical Privacy Protection of Private LLM Adaptations

Lorenzo Rossi; Bartłomiej Marek; Vincent Hanke; Xun Wang; Michael Backes; Adam Dziedzic; Franziska Boenisch

Auditing Empirical Privacy Protection of Private LLM Adaptations

Lorenzo Rossi, Bartłomiej Marek, Vincent Hanke, Xun Wang, Michael Backes, Adam Dziedzic, Franziska Boenisch

Published: 12 Oct 2024, Last Modified: 26 Nov 2024SafeGenAi PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, privacy, differential privacy, dpsgd, LoRA, private fine-tuning, PromptPATE, PromptDPSGD, Adaptations, soft prompt, prefix tuning, hard prompts

TL;DR: lightweight methods like LoRA and prefix tuning provide the best protection against privacy attacks.

Abstract: Recent work has applied differential privacy (DP) methods to adapt large language models (LLMs) for sensitive applications. While DP offers theoretical privacy guarantees, their practical implications for LLM adaptations remain uncertain. This uncertainty arises from LLM pretraining, where overlap and interdependencies between pretraining and adaptation data can impact privacy leakage despite DP adaptation efforts. To analyze the issue from a practical standpoint, we thoroughly investigate privacy risks under "private" adaptations in LLMs. Relying on the latest privacy attacks, such as robust membership inference, we study the actual privacy risks for the pretraining and adaptation data. We benchmark the privacy risks by systematically varying the distribution of adaptation data, ranging from data perfectly overlapping with the pretraining set through in-distribution (IID) scenarios to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different kinds of adaptation methods and different privacy regimes impact the vulnerability. Our results reveal that distribution shifts significantly affect the vulnerability to privacy attacks: the closer the distribution of the adaptation data is to the pretraining distribution, the higher its practical privacy risk, even when there is no overlap between pretraining and adaptation data. We find that the highest empirical privacy protection is achieved for OOD data using parameter-efficient fine-tuning (PEFT) methods, such as LoRA. Surprisingly, when considering data from the same distribution, using the pertaining data for adaptations exhibits a similar privacy leakage as the corresponding validation data. To effectively prevent privacy leakage, it is required to train the adaptations with strict differential privacy protection. Finally, our results show that private adaptations, especially done with prefix tuning, can also decrease the empirical leakage from the pretraining data.

Submission Number: 90

Loading