# The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm (ICLR 2026)

[arXiv.org](https://arxiv.org/abs/2507.18553) |
[GitHub.com](https://github.com/IST-DASLab/GPTQ-Babai) |
[Citation](#citation)

ICLR 2026:
[ICLR.cc](https://iclr.cc/virtual/2026/poster/10009876) |
[OpenReview.net](https://openreview.net/forum?id=NFB4QGGS65) |
[Poster](./assets/babai_poster.pdf)

NeurReps 2025 (NeurIPS 2025 Workshop):
[NeurIPS.cc](https://neurips.cc/virtual/2025/loc/san-diego/136811) |
[OpenReview.net](https://openreview.net/forum?id=xZ4IBFAMUd)

<p align="center">
<img src="./assets/babai.jpg" alt="Visualization of Babai's Algorithm" width="600">
</p>

Official repository for the ICLR 2026 paper "The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm" by *Jiale Chen*, *Yalda Shabanzadeh*, *Elvir Crnčević*, *Torsten Hoefler*, and *Dan Alistarh* from the *Institute of Science and Technology Austria (ISTA)*, *Red Hat, Inc.*, and *ETH Zürich*.

**Keywords:** LLM, Quantization, Lattice Algorithm, Closest Vector Problem

## Paper Abstract

**TL;DR:** The GPTQ algorithm is exactly Babai's nearest plane algorithm for the closest vector problem, giving a geometric view of LLM quantization.

> Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a sequence of algebraic updates that obscure geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: first, the GPTQ error propagation step gains an intuitive geometric interpretation; second, GPTQ inherits the error upper bound of Babai's algorithm under the assumption that no weights are clipped. Leveraging this bound, we design post-training quantization methods that avoid clipping, and outperform the original GPTQ. In addition, we provide efficient GPU inference kernels for the resulting representation. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models. Source code is available at https:<!---->//github.com/IST-DASLab/GPTQ-Babai.

Please read our full paper if you are interested in the research details.

## Repository Structure

This repository is organized around several different components of the paper.

- [notebooks](./notebooks) contains lightweight Jupyter notebooks for interactive toy examples that illustrate the equivalence of GPTQ and Babai's nearest plane algorithm.
- [quantization](./quantization) contains the main quantization and evaluation code, including the CLI entry point, GPTQ/HPTQ/SSQR implementations, and Triton kernels (Hessian accumulation, MSE grid selection, GPTQ error propagation, and min-pivot order).
- [inference_kernels](./inference_kernels) provides the SSQR CUDA inference package for fast low-bit matrix multiplication, together with installation instructions, usage demos, tests, and end-to-end benchmarks for quantized checkpoints.
- [plots](./plots) contains the scripts and generated `.pdf` and `.svg` figures for the paper.

## Citation

Please cite our paper if you find it useful. Thank you!

**Plain text:**

```text
Jiale Chen, Yalda Shabanzadeh, Elvir Crnčević, Torsten Hoefler, and Dan Alistarh. The geometry of LLM quantization: GPTQ as babai's nearest plane algorithm. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=NFB4QGGS65.
```

**BibTex:**

```bibtex
@inproceedings{
chen2026the,
title={The Geometry of {LLM} Quantization: {GPTQ} as Babai's Nearest Plane Algorithm},
author={Jiale Chen and Yalda Shabanzadeh and Elvir Crn{\v{c}}evi{\'c} and Torsten Hoefler and Dan Alistarh},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NFB4QGGS65}
}
```
