The Super Weight in Large Language Models

Mengxia Yu; De Wang; Colorado Reed; Alvin Wan

The Super Weight in Large Language Models

Mengxia Yu, De Wang, Colorado Reed, Alvin Wan

13 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: natural language processing

TL;DR: We discover and study "super weights" in LLM, which are very few in numbers yet crucial to model quality.

Abstract: Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: pruning as few as a single parameter can destroy an LLM’s ability to generate text—resulting in an increase in perplexity by three orders of magnitude and reducing zero-shot accuracy to guessing. We propose a data-free method for identifying such parameters, termed super weights, using a single forward pass through the model. Additionally, we find that these super weights induce correspondingly rare and large activation outliers, termed super activations. When preserved with high precision, super activations can enhance simple round-to-nearest quantization, making it competitive with state-of-the-art methods. For weight quantization, we similarly find that by preserving the super weight and clipping other weight outliers, round-to-nearest quantization can scale to much larger block sizes than previously considered. To facilitate further research into super weights, we provide an index of super weight coordinates for common, openly available LLMs.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 310

Loading