TL;DR: A PATE design that has high utility for diverse tasks
Abstract: The Private Aggregation of Teacher Ensembles (PATE) framework is a versatile approach to privacy-preserving machine learning. In PATE, responses made based on different parts of sensitive data are aggregated into a single response in a privacy-preserving way. Recently, multiple works applied PATE for tasks such as sequential text generation that are inherently
diverse (or "hot"), with multiple valid responses. These designs, however, suffer from
tension between diversity and privacy -- since diversity in the responses reduces agreement which forces the aggregation to use smaller noise scales and thus incur higher privacy loss. But limiting diversity of the aggregate response is undesirable since in modern large language models, the very knowledge we want to transfer is encapsulated in the response distribution.
We propose \emph{hot PATE} that is tailored for the diverse setting where responses are distributions. We formally define \emph{preserving diversity} and design an efficient aggregation method that provably transfers the diversity to the (randomized) aggregate response while incurring no privacy penalty. The method can be implemented using an API access to proprietary models and used as a plug-in replacement for the baseline ``cold'' PATE in existing tools. We demonstrate empirically the potential of hot PATE for an order of magnitude improvement in a task of in-context learning via prompts.
Primary Area: Social Aspects->Privacy
Keywords: PATE, diverse tasks, language generation
Submission Number: 7654
Loading