BitNet Distillation

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: quantization-aware training, post-training quantization
Abstract: Recent advances in extremely low-bit large language models (LLMs), such as the 1.58-bit BitNet, present a promising avenue for improving LLM efficiency in both inference speed and energy consumption. However, deploying such low-bit LLMs for downstream, task-specific applications often necessitates pretraining from scratch to achieve strong performance, which is both computationally and energetically expensive. To overcome this limitation, we propose BitNet Distillation Framework (BDF), a method that fine-tunes pre-trained, full-precision LLMs to 1.58-bit precision for specific downstream tasks, achieving comparable task-specific performance while incurring minimal computational overhead. Specifically, our approach introduces innovations along three dimensions: modeling, training, and distillation, to address the performance degradation and poor scalability often observed in existing fine-tuning methods for low-bit LLMs. Our experimental results demonstrate that \ours{} achieves performance on downstream tasks comparable to that of full-precision models, which facilitating the deployment of larger LLMs on edge devices across a variety of task-specific applications, enabling faster inference and lower energy consumption.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8791
Loading