Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: LLM alignment, particularly DPO, suffers from examples that are too difficult for the model to handle.
Abstract: The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: *Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model's capacity*. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce *Selective DPO*, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16\% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, surpassing a series of DPO variants with different algorithmic adjustments. These results together illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO
Lay Summary: Large language model (LLM) alignment is not a *more data is always better* game. We discover that examples are learned in a consistent order---across different runs and training data---reflecting an intrinsic difficulty tied to model capacity (quantified by validation loss), and that the hardest slice---those lying beyond the model’s reach---actually degrades alignment performance. Dropping these hardest cases and training only on the rest is a tiny change but a big win. On the AlpacaEval 2 benchmark, this cut-down curriculum raises the model’s win rate by 9–16 points, beating a host of more complicated DPO variants. The lesson is clear: tune a model on what it can realistically learn.
Link To Code: https://github.com/glorgao/SelectiveDPO
Primary Area: Deep Learning->Large Language Models
Keywords: alignment, data difficulty, large language models
Submission Number: 12650
Loading