Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

17 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Pruning, Large Language Models, Deep Learning Efficiency, Sparse Neural Networks, Computational Overhead Reduction, AI model compression, Inference Optimization
Abstract: Deep learning models, particularly large-scale language and vision architectures, are computationally intensive due to their extensive number of parameters and complex neural network designs. This paper presents an improved method for model pruning aimed at reducing the computational burden while maintaining performance levels comparable to unpruned models. By analyzing weights, biases, activations, and other key indicators, we propose a novel algorithm that effectively identifies and removes neurons or connections with minimal contribution to the model’s output quality. Our approach achieves a higher pruning efficiency across various pruning ratios, resulting in smaller, faster, and more cost-effective models. Experimental results demonstrate that our method significantly outperforms state-of-the-art (SOTA) pruning techniques in terms of both inference speed and memory usage, with negligible degradation in accuracy. This work contributes to the development of resourceefficient models suitable for deployment in environments with limited computational resources, paving the way for more scalable and sustainable deep-learning applications.
Primary Area: optimization
Supplementary Material: zip
Submission Number: 9705
Loading