Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

ICLR 2026 Conference Submission14996 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Binary Nueral Networks, Large Languge Models, Quantizaion-awareTraining

TL;DR: we present a novel paradigm for training 1-bit large language models (LLMs) by leveraging pretrained floating-point models, addressing the limitations of existing 1-bit quantization methods that rely on training from scratch.

Abstract: 1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training costs and notable accuracy degradation. We identify that the large gap between full precision and 1-bit representations makes naive adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the full-precision weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14996

Loading