Track: tiny / short paper (up to 4 pages)
Keywords: Structured Pruning, Parameter Sparsity, Random Pruning, Model Pruning, Model Compression, Large Language Models
TL;DR: Randomly pruning neurons in LLMs works surprisingly well, especially at lower pruning ratios, and can be combined with activation-based pruning for efficient and competitive results.
Abstract: This paper investigates the structured pruning of large language models (LLMs). We find that random pruning, despite its simplicity, is a surprisingly effective baseline, particularly at lower pruning ratios. We further propose a simple and efficient method that combines randomness with existing pruning heuristics. Specifically, our method combines random neuron clustering with activation magnitude pruning, exhibiting performance comparable to gradient-based methods while being significantly more efficient (up to 50x faster). Our code is available at https://github.com/Tim-Siu/llm-random-prune.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 38
Loading