Personalization Under Value Conflict: Resolving Contradictory Preferences with Paired Fine-Tuning

ICLR 2026 Conference Submission19175 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Preference confict, Personalization
TL;DR: We present a new dataset and training paradigm for aligning LLMs with diverse and even contradictory individual preferences, providing a step toward one model that can adapt to all preferences under value conflict.
Abstract: Large language models (LLMs) are increasingly expected to capture not only broadly shared human universal values but also the diverse and often contradictory preferences of individual users. Existing alignment approaches typically optimize for a single preference direction, making them unsuitable when users switch between opposing values. We propose \textbf{Preference-Paired Fine-Tuning (PFT)}, a framework that trains models on paired contradictory preferences, enabling a single model to align with both sides simultaneously. Beyond handling one preference pair, PFT generalizes to multiple mutually exclusive preference dimensions, capturing shared structures across conflicts. With only a few in-context examples from user history, PFT further enables rapid and data-efficient customization, yielding stronger alignment to individual preferences. Experiments show that PFT achieves up to $\textbf{96.7\% }$classification accuracy, improves open-ended generation scores by $\textbf{up to 20.05\%}$, and reduces data requirements by about $\textbf{40\%}$ compared to single-preference fine-tuning. These results highlight a scalable path toward conflict-aware and personalized LLMs.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19175
Loading