NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Published: 05 Mar 2025, Last Modified: 21 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: Quantization, Vector Quantization LLMs, Compression, Sparsity, Pruning
Abstract: Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environ ments. To address this challenge, we propose NoWA (Normalized Weight and Activation Compression), a unified framework for zero-shot shape preserving compression algorithms. We compressed Llama-2 7B/13B/70B and Llama-3 8B models, using two popular forms of shape-preserving compression, vector quantization NoWA-VQ (NoWA for Vector Quantization), and unstructured/structured pruning NoWA-P (NoWA for Pruning). We found that NoWA-VQ significantly outperforms state-of-the-art zero shot VQ, and that NoWA-P performs competitively against state-of-the-art methods.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 80
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview