Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Lynn Chua; Badih Ghazi; Yangsibo Huang; Pritish Kamath; Ravi Kumar; Daogao Liu; Pasin Manurangsi; Amer Sinha; Chiyuan Zhang

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Data, Societal implications, Safety

Keywords: Privacy, User-level privacy, Differential privacy

TL;DR: We present a systematic evaluation of user-level differential privacy for LLM fine-tuning on natural language generation tasks

Abstract: Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are “almost indistinguishable” with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it is necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 933

Loading