Privacy in the Time of Language Models

Charith Peris, Christophe Dupuy, Jimit Majmudar, Rahil Parikh, Sami Smaili, Richard S. Zemel, Rahul Gupta

Published: 01 Jan 2023, Last Modified: 01 May 2023WSDM 2023Readers: Everyone

Abstract: Pretrained large language models (LLMs) have consistently shown state-of-the-art performance across multiple natural language processing (NLP) tasks. These models are of much interest for a variety of industrial applications that use NLP as a core component. However, LLMs have also been shown to memorize portions of their training data, which can contain private information. Therefore, when building and deploying LLMs, it is of value to apply privacy-preserving techniques that protect sensitive data. In this talk, we discuss privacy measurement and preservation techniques for LLMs that can be applied in the context of industrial applications and present case studies of preliminary solutions. We discuss select strategies and metrics relevant for measuring memorization in LLMs that can, in turn, be used to measure privacy-risk in these models. We then discuss privacy-preservation techniques that can be applied at different points of the LLM training life-cycle; including our work on an algorithm for fine-tuning LLMs with improved privacy. In addition, we discuss our work on privacy-preserving solutions that can be applied to LLMs during inference and are feasible for use at run time.

0 Replies