R.I.P.: Better Models by Survival of the Fittest Prompts

Ping Yu; Weizhe Yuan; Olga Golovneva; Tianhao Wu; Sainbayar Sukhbaatar; Jason E Weston; Jing Xu

R.I.P.: Better Models by Survival of the Fittest Prompts

Ping Yu, Weizhe Yuan, Olga Golovneva, Tianhao Wu, Sainbayar Sukhbaatar, Jason E Weston, Jing Xu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

TL;DR: New data selection & synthetic data creation method Rejecting Instruction Preferences (RIP) dramatically improves model performance by filtering out 77% training examples.

Abstract: Training data quality is one of the most important drivers of final model quality. In this work, we introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses. This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair. Our method, Rejecting Instruction Preferences (RIP) can be used to filter prompts from existing training sets, or to make high quality synthetic datasets, yielding large performance gains across various benchmarks compared to unfiltered data. Using Llama 3.1-8B-Instruct, RIP improves AlpacaEval2 LC Win Rate by 9.4%, Arena-Hard by 8.7%, and WildBench by 9.9%. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9, from 18th place to 6th overall in the leaderboard.

Lay Summary: The quality of training data is crucial for building powerful AI models. In this work, we introduce a simple yet effective way to improve training data by identifying and removing bad prompts—questions or instructions that lead to poor or inconsistent answers. Our method, called Rejecting Instruction Preferences (RIP), looks at two signals that are indicative of quality of the given instruction:1) If a model can produce really bad answers given a particular instruction, that’s a warning sign—the instruction itself might be unclear, misleading, or too open-ended. In other words, the worse the model could do, the more likely the instruction is low quality. 2) how much better a model’s preferred answer is compared to a rejected one. If the gap is small, it suggests the prompt may be confusing or low quality. Our method takes advantage of this idea. By filtering out such prompts, RIP helps models learn from clearer, more useful examples. This leads to much better performance across a wide range of tests. For example, using RIP with Meta’s Llama 3.1 models boosted scores by up to 10% on several standard evaluations. On a competitive leaderboard, our approach moved a top-tier model from 18th to 6th place.

Primary Area: Deep Learning->Large Language Models

Keywords: Data Filtering, Preference Optimization, Synthetic Data

Submission Number: 4801

Loading