BENFORD-QUANT: A BENFORD'S LAW-INSPIRED NON-UNIFORM QUANTIZER FOR EFFICIENT LANGUAGE MODELS

BENFORD-QUANT: A BENFORD'S LAW-INSPIRED NON-UNIFORM QUANTIZER FOR EFFICIENT LANGUAGE MODELS

ICLR 2026 Conference Submission14249 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Model Compression, Weight Quantization, Benford's Law, Transformers, Data-free Quantization.

TL;DR: Benford's Law used in Quantization

Abstract: The rapid growth of Large Language Models (LLMs) intensifies the need for effective compression, with weight quantization being the most widely adopted technique. Standard uniform quantizers assume that parameters are evenly distributed, an assumption at odds with the highly skewed distributions observed in practice. We propose Benford-Quant, a simple, data-free non-uniform quantizer inspired by Benford's Law, which predicts that leading digits follow a logarithmic distribution. Benford-Quant replaces the uniform grid with a log-spaced codebook, dedicating more resolution to the frequent small-magnitude weights. We provide both theoretical intuition and empirical evidence: (i) weights in transformer transformational layers adhere closely to Benford statistics, while normalization layers systematically deviate; (ii) on Small Language Models (SLMs), Benford-Quant consistently improves perplexity, reducing 4-bit perplexity on Gemma-270M by more than 10%; and (iii) on larger LLMs, it remains competitive, with differences explained by over-parameterization effects. Our results indicate that incorporating a Benford-inspired prior into quantization grids is a low-cost modification that yields accuracy gains in aggressive few-bit regimes.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14249

Loading