Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

Shaobo Wang; Jiaming Wang; Jiajun Zhang; Cong Wang; Yue Min; Zichen Wen; Fei Huang; Huiqiang Jiang; Junyang Lin; Dayiheng Liu; Linfeng Zhang

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

Shaobo Wang, Jiaming Wang, Jiajun Zhang, Cong Wang, Yue Min, Zichen Wen, Fei Huang, Huiqiang Jiang, Junyang Lin, Dayiheng Liu, Linfeng Zhang

03 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Data Efficiency, Data Pruning, Token Pruning

TL;DR: We introduce the Error–Uncertainty (EU) Plane to capture data variation, and build on it to design Quadrant-based Tuning (Q-Tuning), the first joint sample and token pruning framework for efficient LLM fine-tuning that beats full-data training..

Abstract: As supervised fine-tuning (SFT) evolves from a lightweight post-training step into a compute-intensive phase rivaling mid-training in scale, data efficiency has become critical for aligning large language models (LLMs) under tight budgets. Existing data pruning methods suffer from a fragmented design: they operate either at the sample level or the token level in isolation, failing to jointly optimize both dimensions. This disconnect leads to significant inefficiencies—high-value samples may still contain redundant tokens, while token-level pruning often discards crucial instructional or corrective signals embedded in individual examples. To address this bottleneck, we introduce the *Error–Uncertainty (EU) Plane*, a diagnostic framework that jointly characterizes the heterogeneous utility of training data across samples and tokens. Guided by this insight, we propose *Quadrant-based Tuning (Q-Tuning)*, a unified framework that strategically coordinates sample pruning and token pruning. Q-Tuning employs a two-stage strategy: first, it performs sample-level triage to retain examples rich in informative misconceptions or calibration signals; second, it applies an asymmetric token-pruning policy, using a context-aware scoring mechanism to trim less salient tokens exclusively from misconception samples while preserving calibration samples in their entirety. Our method sets a new state of the art across five diverse benchmarks. Remarkably, on SmolLM2-1.7B, Q-Tuning achieves a +38\% average improvement over the full-data SFT baseline using only 12.5\% of the original training data. As the first dynamic pruning approach to consistently outperform full-data training, Q-Tuning provides a practical and scalable blueprint for maximizing data utilization in budget-constrained LLM SFT. The code is attached in the supplementary materials.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 1562

Loading