Recovery-on-the-line: Linear trends in post-quantization performance recovery

Shashata Sawmya; Shuvom Sadhuka; Ragulan Sivakumar; Nir N Shavit; Dan Alistarh; Bonnie Berger

Recovery-on-the-line: Linear trends in post-quantization performance recovery

Shashata Sawmya, Shuvom Sadhuka, Ragulan Sivakumar, Nir N Shavit, Dan Alistarh, Bonnie Berger

Published: 05 Mar 2025, Last Modified: 25 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 4 pages)

Keywords: quantization, evaluation

TL;DR: We conduct a suite of evaluations to analyze the relationship in recovery of performance metrics across several tasks in compressed LLMs.

Abstract: Many state-of-the-art large language models exceed tens of billions of parameters. To compress these models, several prior works have proposed quantizing the weights and activations of these models, using techniques like GPTQ and QuIP. An important reference for quantized language models is their ability to \textit{recover} performance metrics such as accuracy after quantization; that is, the quantized language models should be as accurate as the original base models. It remains, however, unclear, which evaluations are most needed to assess recovery. If, for instance, recovery across all tasks is strongly linear (i.e. recovery on task A is a linear function of recovery on task B), then there should exist a dense subset of (independent) evaluations that are most necessary. In this paper, we examine the trends in recovery across pairs of tasks and metrics. Drawing from prior works which have shown that in-distribution and out-of-distribution accuracy often exhibit a strong linear relationship, we show that this relationship holds for recovery of accuracy as well.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 6

Loading