Abstract: Large language models (LLMs) have become crucial across various domains, yet it comes at the expense of considerable computational and memory resources. Model pruning refines deep learning models by excising redundant elements. However, current pruning methods often fail to substantially achieve end-to-end acceleration. In this paper, we propose {MI-PRUN}, a novel approach that uses {mutual information} to identify low impact blocks for efficient model pruning. Furthermore, we incorporate the {Data Processing Inequality (DPI)} to elucidate the relationship between the importance of contiguous blocks and that of individual blocks. We utilize an iterative block selection algorithm to continuously update the combination of blocks that have the minimal impact on model performance, thereby obtaining a globally optimal solution. To enhance the efficiency of pruning, we develop the {Fast-Block-Select} algorithm to accelerate the pruning process. Comprehensive experiments on a wide range of models and datasets have demonstrated the rationality and effectiveness of our method.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: pruning
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Keywords: pruning
Submission Number: 1886
Loading