The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Jiale Chen; Yalda Shabanzadeh; Torsten Hoefler; Dan Alistarh

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Jiale Chen, Yalda Shabanzadeh, Torsten Hoefler, Dan Alistarh

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Quantization, Lattice Algorithm, Closest Vector Problem

TL;DR: The GPTQ algorithm is exactly Babai's nearest plane algorithm for the closest vector problem, giving a geometric view of LLM quantization.

Abstract: Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale. Yet, its inner workings are described as a sequence of ad-hoc algebraic updates that obscure any geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: (i) the GPTQ error propagation step gains an intuitive geometric interpretation; (ii) GPTQ inherits the error upper bound of Babai's algorithm under the no-clipping condition. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models.

Submission Number: 71

Loading