X-Pruner: An Adaptive Pruning Method with Self-Compensation Driven by Reinforcement Learning for Language Models

ICLR 2026 Conference Submission19016 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Compression, Efficient AI, Pruning, Language Model
Abstract: As small language models (SLMs) emerge as the backbone of on-device, mobile and edge devices, their constrained computational and memory budgets necessitate aggressive yet reliable pruning. Compared with their larger counterparts, SLMs exhibit more sensitivity to parameter removal, rendering the design of robust pruning strategies particularly challenging. Existing post-training pruning techniques, predominantly designed for large language models (LLMs), rely on static criteria computed from tiny calibration sets, often resulting in suboptimal generalization. In this paper, we present X-Pruner, an unstructured adaptive pruning framework featuring a variable-exponent importance metric. To unlock its full potential, we introduce a reinforcement learning-based search algorithm that efficiently identifies optimal parameter configurations. We further reveal that the pruning path itself influences post-pruning performance and creatively propose the self-compensation mechanism, which rectifies pruning-induces errors through layer-wise adaptive adjustments; grounded in this insight, we also devise a unified path-scoring function to evaluate and select optimal pruning sequences across diverse target models. Extensive experiments on multiple language benchmarks demonstrate that X-Pruner consistently surpasses state-of-the-art post-training pruning techniques under comparable settings—achieving superior performance without any retraining—and in certain cases, even outperforms approaches involving update weights.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 19016
Loading