Private Data Synthesis for Preference Alignment of Large Language Models

Private Data Synthesis for Preference Alignment of Large Language Models

ICLR 2026 Conference Submission13000 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Preference Alignment, Differential Privacy, Large Language Models

TL;DR: The first framework that generates differentially private synthetic preference data, enabling privacy-preserving preference alignment of large language models.

Abstract: Preference alignment has become a crucial technique for aligning large language models (LLMs) with human values. However, training on real human preference data raises privacy concerns, as these datasets often contain sensitive user prompts and human judgments. To address this, we propose **DPPrefSyn**, a novel algorithm for generating differentially private (DP) synthetic preference data to enable privacy-preserving preference alignment. DPPrefSyn addresses three key challenges: modeling diverse human preferences via DP clustering and per-cluster DP scoring models; reducing dimensionality with DP-PCA to improve efficiency; and conserving privacy budget by leveraging public prompts. We conduct extensive experiments on three standard benchmarks and compare our method with DP fine-tuning on real data. Our results show that our framework achieves competitive performance under strong privacy guarantees. These results open up new possibilities for preference alignment with privacy protection for a broad range of applications. To the best of our knowledge, this is the first work to generate DP synthetic preference data for LLM alignment.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 13000

Loading