NoWag:  A Unified Framework for Shape Preserving Compression of Large Language Models

Lawrence Ray Liu; Inesh Chakrabarti; Yixiao Li; Mengdi Wang; Tuo Zhao; Lin Yang

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin Yang

Published: 05 Mar 2025, Last Modified: 21 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 4 pages)

Keywords: Quantization, Vector Quantization LLMs, Compression, Sparsity, Pruning

Abstract: Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environ ments. To address this challenge, we propose NoWA (Normalized Weight and Activation Compression), a unified framework for zero-shot shape preserving compression algorithms. We compressed Llama-2 7B/13B/70B and Llama-3 8B models, using two popular forms of shape-preserving compression, vector quantization NoWA-VQ (NoWA for Vector Quantization), and unstructured/structured pruning NoWA-P (NoWA for Pruning). We found that NoWA-VQ significantly outperforms state-of-the-art zero shot VQ, and that NoWA-P performs competitively against state-of-the-art methods.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 80

Loading