Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead
Keywords: Model Pruning, Large Language Models, Deep Learning Efficiency, Sparse Neural Networks, Computational Overhead Reduction, AI model compression, Inference Optimization
Abstract: Deep learning models, particularly large-scale language
and vision architectures, are computationally intensive due
to their extensive number of parameters and complex neural
network designs. This paper presents an improved method for
model pruning aimed at reducing the computational burden
while maintaining performance levels comparable to unpruned
models. By analyzing weights, biases, activations, and other key
indicators, we propose a novel algorithm that effectively identifies
and removes neurons or connections with minimal contribution
to the model’s output quality. Our approach achieves a higher
pruning efficiency across various pruning ratios, resulting in
smaller, faster, and more cost-effective models. Experimental
results demonstrate that our method significantly outperforms
state-of-the-art (SOTA) pruning techniques in terms of both
inference speed and memory usage, with negligible degradation in
accuracy. This work contributes to the development of resourceefficient
models suitable for deployment in environments with
limited computational resources, paving the way for more scalable
and sustainable deep-learning applications.
Primary Area: optimization
Supplementary Material: zip
Submission Number: 9705
Loading