OASIS: An Optimized Approach to Systematic Calibration Data Selection

ICLR 2026 Conference Submission8339 Authors

17 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Pruning, Calibration Data Selection, Large Langauge Model
TL;DR: Random calibration data sabotages pruning; OASIS adaptively learns the right subset to stabilize and improve LLM Pruning.
Abstract: Post-training pruning is a critical technique for compressing Large Language Models. However, as shown in previous research, its effectiveness is highly sensitive to the small set of calibration data used for estimating parameter importance. Current calibration data selection relies on simple heuristics like random sampling or entropy, which often leads to suboptimal and inconsistent pruning outcomes: the same pruning method applied with different calibration data can cause up to 3× variance in post-pruning perplexity. In this work, we reveal the source of this inconsistency: calibration samples are not equally important; a quality hierarchy exists within any data pool. Not only does mixing high- and low-quality data cause a performance degradation, but the quality of the sample is context-dependent, changing with the specific model and pruning algorithm, rendering static filtering infeasible and necessitating an adaptive solution. Therefore, we introduce OASIS, the first end-to-end framework that directly optimizes calibration data selection with respect to the pruned model’s downstream performance. OASIS leverages a differentiable soft-mask proxy to propagate task-level gradients back to the calibration data, enabling dynamic discovery of the most beneficial subset. Experiments show that our approach improves the performance of diverse state-of-the-art pruning methods, establishing a new standard for data-aware model compression.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8339
Loading