Quantization-Aware Training: A Comprehensive Survey

Quantization-Aware Training: A Comprehensive Survey

ACL ARR 2026 January Submission8405 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Quantization-Aware, quantization, Parameter-Efficient Fine-Tuning

Abstract: With the unprecedentedly rapid development of LLMs, the prohibitive memory footprints and intensive computational demands of models have emerged as critical bottlenecks. Quantization-Aware Training (QAT) techniques have emerged as a primary solution to address these challenges by explicitly simulating quantization effects within the training loop, enabling low-bit models to achieve accuracy comparable to their full-precision counterparts. In this work, we strive to provide a comprehensive survey on QAT, serving as a valuable resource for researchers aiming to understand the theory of QAT and its evolving implementation landscape. To the best of our knowledge, this is the first systematic survey dedicated to reviewing the recent developments of QAT. In this paper, we systematically review existing QAT methodologies based on a taxonomy organized by quantization targets. We provide an in-depth analysis of the technical connections and distinctions among these methods and summarize their evaluation paradigms. Furthermore, we discuss persistent challenges and outline potential directions for future research.

Paper Type: Long

Research Area: Low-resource Methods for NLP

Research Area Keywords: quantization,distillation,NLP in resource-constrained settings

Contribution Types: Surveys

Languages Studied: english

Submission Number: 8405

Loading