BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook

03 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: language model, model compression, computational efficiency, binary quantization, sub-1-bit compression
TL;DR: Sub 1 bit Large language model Quantization
Abstract: Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to $\pm$1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask management, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages weight transformation and binary pattern clustering to overcome these limitations, delivering both superior accuracy and efficiency. Our approach incorporates two key innovations: (1) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters, compressing them into compact indices with tailored distance metrics and sign-based centroid updates; (2) a Learnable Transformation that optimizes invertible scaling and rotation matrices to align binarized weights with full-precision distributions, enabling incoherence processing to enhance layer-wise representation quality. This eliminates the need for sparse masks, enabling efficient inference on standard hardware. Extensive evaluations across LLaMA-1/2/3, Qwen-2.5/3, and FBI-LLM families demonstrate that BTC-LLM establishes a new state-of-the-art for extreme LLM compression at 1.11$\sim$0.7 bits. Notably, our BTC-LLM delivers strong performance under extreme compression settings, with just a 3.1\% accuracy drop on LLaMA-2-13B at 0.8 bits in zero-shot benchmarks while achieving a 1.6$\times$ speedup over FP16. Code is in the Appendix.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1732
Loading