Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Keywords: Large Language Model, Data Efficiency, Data Pruning, Token Pruning
TL;DR: We introduce the Error–Uncertainty (EU) Plane to capture data variation, and build on it to design Quadrant-based Tuning (Q-Tuning), the first joint sample and token pruning framework for efficient LLM fine-tuning that beats full-data training..
Abstract: As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies—high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the *Error–Uncertainty (EU) Plane*, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose *Quadrant-based Tuning (Q-Tuning)*, a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT. The code is attached in the supplementary materials.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1562
Loading