Synthetic Preference Interpolation for Language Model Alignment

Synthetic Preference Interpolation for Language Model Alignment

ACL ARR 2025 May Submission2890 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Ensuring alignment with human preferences is a critical and challenging aspect of large language models (LLMs). Currently, the most widely adopted alignment methods, such as those based on Direct Preference Optimization (DPO), leverage pairwise preference data for training and have demonstrated promising results. However, these methods face limitations, as they cannot fully exploit the rich information inherent in preference data, such as intermediate quality levels between chosen and rejected samples. Motivated by this insight, we propose Synthetic Preference Interpolation Alignment (SPIA), a novel alignment algorithm that introduces interpolated synthetic preferences to better capture the nuances between samples of different quality levels. By constructing synthetic preference data that reflects intermediate quality with pair-wise preference data, our method effectively bridges the gap between binary pairwise comparisons and richer quality representation. Additionally, compared to other list-wise optimization methods, our approach does not require stronger models for annotation, making it more practical and cost-effective. Our results demonstrate that SPIA not only outperforms existing methods on various benchmarks but also provides valuable insights into harnessing preference data for stronger human-aligned LLMs.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: fine-tuning,alignment

Contribution Types: Approaches to low-resource settings

Languages Studied: English

Submission Number: 2890

Loading