Abstract: Large language models have become crucial across various domains, yet it comes at the expense of considerable computational and memory resources. Model pruning refines deep learning models by excising redundant elements. However, current pruning methods often fail to substantially achieve end-to-end acceleration. In this paper, we present MI-PRUN, a novel approach that uses mutual information to identify low-impact blocks for efficient model pruning. Furthermore, we incorporate the Data Processing Inequality to ensure the preservation of contiguous blocks essential for overall model performance, avoiding their accidental pruning. Concurrently, we develop the Fast-Block-Select algorithm to enhance the efficiency of the pruning process. Comprehensive experiments show that our proposed method surpasses the previous state-of-the-art (SOTA) model pruning methods.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: pruning
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 461
Loading