Keywords: Pruning, Machine Learning, Large Language Model, Deep Learning
TL;DR: We show a mathematical derivation of the pruning problem that leads to previous methods (Wanda) and we extend it to a more general case by proposing a new pruning method (STADE)
Abstract: Large Language Models (LLMs) have become very widespread and are used to
solve a wide variety of tasks. To successfully handle many of these tasks, LLMs
require longer training times and larger model sizes. This makes LLMs ideal
candidates for pruning methods that reduce computational demands while main-
taining performance. Previous methods require a retraining phase after pruning
to maintain the original model’s performance. However, state-of-the-art pruning
methods, such as Wanda, prune the model without retraining, making the pruning
process faster and more efficient. Building upon Wanda’s work, this study pro-
vides a theoretical explanation of why the method is effective and leverages these
insights to enhance the pruning process. Specifically, a theoretical analysis of the
pruning problem reveals a common scenario in Machine Learning where Wanda
is the optimal pruning method. Furthermore, this analysis reveals cases where
Wanda is no longer optimal. To tackle those cases, we develop a new method,
STADE, based on the standard deviation of the input. From a theoretical and em-
pirical standpoint, STADE demonstrates better generality across different scenar-
ios. Finally, extensive experiments on Qwen, Llama and Open Pre-trained Trans-
formers (OPT) models validate these theoretical findings, showing that depending
on the training conditions, Wanda’s optimal performance varies as predicted by
the theoretical framework.
From a theoretical and empirical standpoint, STADE demonstrates better generality across different scenarios.
Finally, extensive experiments on Qwen, Llama and Open Pre-trained Transformers (OPT) models validate these theoretical findings, showing that depending on the training conditions, Wanda's optimal performance varies as predicted by the theoretical framework.
These insights contribute to a more robust understanding of pruning strategies and their practical implications.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 9118
Loading