Keywords: LLM, Quantization, Binary-coding Quantization (BCQ), Uniform Quantization (UQ)
TL;DR: We propose FlexBCQ, an accurate training algorithm for binary-coding quantization (BCQ). We reformulate the quantization process of BCQ to exploit advanced training techniques designed for uniform quantization, preserving its expressive power.
Abstract: How can we compress large language models without compromising accuracy?
Quantization, which reduces the number of bits for representing weights, is an essential technique to utilize large language models (LLMs) in real-world applications.
Specifically, binary-coding quantization (BCQ) is a promising approach since it has extensive representation space, which encompasses the representation space of uniform quantization (UQ), and fast inference speed.
However, because of the lack of accurate optimization techniques, BCQ shows inferior performance compared to UQ algorithms, failing to leverage their powerful expressive power.
In this paper, we propose FlexBCQ (Flexible Binary-coding Quantization), an accurate optimization algorithm for BCQ.
We leverage the sophisticated optimization techniques of UQ by decomposing the quantization process of BCQ into the composition of a UQ and an inner BCQ.
As a result, we take advantage of both the sophisticated optimizing techniques of UQ, specifically the flexible mapping technique, and the powerful expressive capability of BCQ.
Through extensive experiments, we find that FlexBCQ provides 3.24%p higher accuracy than existing UQ and BCQ algorithms on MMLU 5-shot benchmark when quantizing a Llama-3 70B model into 3 bits.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8982
Loading