Abstract: Neural network pruning has become increasingly crucial due to the complexity of neural network models and their widespread use in different fields. However, most pruning algorithms are architecture-specific or extremely complex, focusing on elaborate algebraic or geometric calculations for parameter elimination. Additionally, many algorithms are only useful during training or testing phases and reset once the model stops working.
This article presents KEN: a simple, unstructured, magnitude-based pruning algorithm based on KDE (Kernel Density Estimator). KEN is designed to create a concentrated transformer model that can be easily injected into its pre-trained version for downstream tasks. We tested KEN on five different transformer models and observed that it performs the same as the original models but with a minimum average weight reduction of 25%. Moreover, we compared KEN with other pruning algorithms and found that it outperforms them, even with fewer parameters. Finally, KEN allows the download of the compact model, reducing memory storage and to inject it into its pre-trained version at any time.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading