Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
TL;DR: we present a novel paradigm for training 1-bit large language models (LLMs) by leveraging pretrained floating-point models, addressing the limitations of existing 1-bit quantization methods that rely on training from scratch.
Abstract: 1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training costs and notable accuracy degradation. We identify that the large gap between full precision and 1-bit representations makes naive adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the full-precision weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.
Lay Summary: Modern Large Language Models (LLMs) typically operate in a 16-bit format, demanding massive memory and computational resources, but compressing them to an extreme "1-bit" format can drastically reduce these costs. While current methods require building 1-bit models from scratch, directly converting a pre-trained 16-bit model to 1-bit causes a severe drop in accuracy because the numerical gap between them is too drastic. To bridge this gap, we introduce a progressive training method that smoothly and gradually transitions a pre-trained 16-bit model into a 1-bit format, paired with specialized initialization and compensation techniques to preserve the model's intelligence. Our experiments across various model sizes demonstrate that this method significantly outperforms existing approaches, proving that highly capable 1-bit LLMs can be successfully adapted from existing 16-bit models without the need for costly training from scratch.
Primary Area: Deep Learning->Large Language Models
Keywords: 1-bit quantization, progressive training
Originally Submitted PDF: pdf
Submission Number: 19244
Loading