LEP: Leveraging Local Entropy Pruning for Sparsity in Large Language Models

Published: 2025, Last Modified: 08 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The application of Large Language Models (LLMs) is rapidly expanding in fields such as natural language processing and computer vision. However, due to the enormous number of model parameters, while their emergent capabilities enhance performance, they also incur significant computational and storage costs, particularly during inference. This makes it particularly challenging to deploy large models in resource-constrained environments. To address this issue, we propose an innovative pruning method based on local entropy, aimed at improving model performance while reducing computational costs. Our approach leverages a small amount of calibration data and calculates the importance of weights based on the local entropy of weights and input activations, applying it to both unstructured and semi-structured pruning without the need for complex weight updates or retraining. Experimental results demonstrate that our method shows superior performance across models of various scales. Specifically, at a sparsity rate of 50%, our method significantly reduces model perplexity without the need for weight updates and outperforms existing pruning techniques such as SparseGPT and Wanda in zero-shot tasks. The code is available at: this https URL.
Loading