Private Federated Learning using Preference-Optimized Synthetic Data

Charlie Hou; Mei-Yu Wang; Yige Zhu; Daniel Lazar; Giulia Fanti

Private Federated Learning using Preference-Optimized Synthetic Data

Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, Giulia Fanti

Published: 01 May 2025, Last Modified: 16 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We cast private on-device learning (under the synthetic data framework) as an LLM preference optimization problem, and greatly improve the state of the art.

Abstract: In practical settings, differentially private federated learning (DP-FL) is the dominant method for training models from private, on-device client data. Recent work has suggested that DP-FL may be enhanced or outperformed by methods that use DP synthetic data (Wu et al., 2024; Hou et al., 2024). The primary algorithms for generating DP synthetic data for FL applications require careful prompt engineering based on public information and/or iterative private client feedback. Our key insight is that the private client feedback collected by prior DP synthetic data methods (Hou et al., 2024; Xie et al., 2024) can be viewed as an RL reward. Our algorithm, Policy Optimization for Private Data (POPri) harnesses client feedback using policy optimization algorithms such as Direct Preference Optimization (DPO) to fine-tune LLMs to generate high-quality DP synthetic data. To evaluate POPri, we release LargeFedBench, a new federated text benchmark for uncontaminated LLM evaluations on federated client data. POPri closes the gap in performance between the fully-private and non-private settings by up to 58%, compared to 28% for prior synthetic data methods, and 3% for state-of-the-art DP federated learning methods. The code and data are available at https://github.com/meiyuw/POPri.

Lay Summary: Smartphones and other personal devices hold valuable text that could teach predictive keyboards and voice assistants to work better for everyone. But collecting this text from personal devices to central data warehouses may violate user privacy. The current solution for this, called federated learning, trains a small copy of a model on each device and shares only scrambled updates, but it so far has not been able to scale to today's large models. Our research shows a different path: let a large language model (LLM) guess private-style text on the server, then train the small on-device model on that synthetic data. To make those guesses accurate without peeking at the real data, each phone simply scores the guessed samples for their closeness to the real data and sends back a noise-blurred version of those scores. We turn these crowd-sourced scores into a powerful fine-tuning recipe called POPri, which teaches the LLM to write ever-better synthetic text. Across several real-world datasets, POPri nearly erases the accuracy gap between fully private training and a no-privacy baseline while keeping users’ original data safe and sound.

Link To Code: https://github.com/meiyuw/POPri

Primary Area: Social Aspects->Privacy

Keywords: Differential privacy, large language models, synthetic data, federated learning, preference optimization, reinforcement learning

Flagged For Ethics Review: true

Submission Number: 15630

Loading