MI-PRUN : Pruning the Blocks in Large Language Models with Minimal Performance Impact

MI-PRUN : Pruning the Blocks in Large Language Models with Minimal Performance Impact

ACL ARR 2025 May Submission1886 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) have become crucial across various domains, yet it comes at the expense of considerable computational and memory resources. Model pruning refines deep learning models by excising redundant elements. However, current pruning methods often fail to substantially achieve end-to-end acceleration. In this paper, we propose {MI-PRUN}, a novel approach that uses {mutual information} to identify low impact blocks for efficient model pruning. Furthermore, we incorporate the {Data Processing Inequality (DPI)} to elucidate the relationship between the importance of contiguous blocks and that of individual blocks. We utilize an iterative block selection algorithm to continuously update the combination of blocks that have the minimal impact on model performance, thereby obtaining a globally optimal solution. To enhance the efficiency of pruning, we develop the {Fast-Block-Select} algorithm to accelerate the pruning process. Comprehensive experiments on a wide range of models and datasets have demonstrated the rationality and effectiveness of our method.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: pruning

Contribution Types: Approaches to low-resource settings

Languages Studied: English

Keywords: pruning

Submission Number: 1886

Loading