Posterior Restoration for Enhanced LLM Pruning

05 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, deep neural network pruning, post-training pruning
Abstract: Pruning compresses and accelerates deep neural networks, making it important for deploying large language models (LLMs). However, traditional pruning methods, which use prior criteria in dense models to evaluate weight importance, face two key limitations: (1) they often overfit to calibration data and (2) they ignore weight interactions, leading to inaccurate importance estimation. To address these limitations, we propose posterior restoration, a simple two-stage approach. First, we apply a conventional prior criterion to generate an initial coarse pruning mask. Second, we restore the most important weights, guided by our novel posterior criteria (magnitude, global, and local), which re-evaluate the removed weights from the perspective of the already-pruned model. This unique viewpoint mitigates overfitting and captures previously ignored weight interactions. A key advantage of this scheme is its ability to seamlessly integrate and enhance most existing pruning methods. Experiments on Llama-3.1-8B and Mistral-7B across unstructured, channel-wise, and 2:4 sparsity patterns demonstrate that posterior restoration generally enhances pruned model performance. Our results show that the data-independent posterior magnitude criterion effectively mitigates overfitting, while the posterior global and local criteria successfully capture weight interactions.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 2343
Loading