The Best Instruction-Tuning Data are Those That Fit

Dylan Zhang; Qirun Dai; Hao Peng

The Best Instruction-Tuning Data are Those That Fit

Dylan Zhang, Qirun Dai, Hao Peng

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Instruction Tuning, Data Selection, Efficiency, Post-Training

Abstract: High-quality supervised finetuning (SFT) data are essential for unlocking pretrained LLMs’ capabilities. Typically, instructions are paired with responses from various sources—by human annotators or other LMs—which are often out of the distribution of the target model to be finetuned. At scale, this mismatch can lead to diminishing returns and even hurt model performance and robustness. We hypothesize that SFT is most effective when the data is aligned with the model’s pretrained distribution, and propose **GRAPE**—a novel SFT framework that tailors supervision to the target model. For each instruction, it **g**athers **r**esponses from various sources and selects the one that **a**ligns most closely to the model’s **pre**trained distribution, as measured by the normalized probability. Standard SFT is then performed on these selected responses. We first evaluate GRAPE in a controlled experiment, sampling multiple responses per question in the UltraInteract dataset from diverse models. We finetune using GRAPE-selected data on LMs from different families, including LLaMA-1-8B, Mistral-7B, and Qwen2.5-7B. GRAPE significantly outperforms strong baselines—including distilling from the strongest model—with absolute gains up to **13.8%** averaged across benchmarks, and outperforms a 3× larger data baseline with improvements up to **17.3%**. GRAPE's benefits generalize to off-the-shelf SFT data. When used to subsample from the post-training data of Tulu3 and Olmo-2, GRAPE surpasses strong baselines trained on 4.5× more data by **6.1%**, and outperforms state-of-the-art selection methods by **3.9%** on average. Notably, with only **1/3 the data** and **half the training epochs**, GRAPE enables LLaMA-1-8B to **exceed Tulu3-SFT performance by 3.5%**. Our findings highlight that aligning supervision with the pretrained distribution provides a simple yet powerful strategy to improve both the **efficiency** and **effectiveness** of SFT.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 23627

Loading