Keywords: Quantization-Aware, quantization, Parameter-Efficient Fine-Tuning
Abstract: With the unprecedentedly rapid development of LLMs, the prohibitive memory footprints and intensive computational demands of models have emerged as critical bottlenecks. Quantization-Aware Training (QAT) techniques have emerged as a primary solution to address these challenges by explicitly simulating quantization effects within the training loop, enabling low-bit models to achieve accuracy comparable to their full-precision counterparts.
In this work, we strive to provide a comprehensive survey on QAT,
serving as a valuable resource for researchers aiming to understand the theory of QAT
and its evolving implementation landscape.
To the best of our knowledge, this is the first systematic survey dedicated to reviewing
the recent developments of QAT. In this paper, we systematically review existing QAT methodologies
based on a taxonomy organized by quantization targets. We provide an in-depth analysis of the technical connections and distinctions among these methods and summarize their evaluation paradigms.
Furthermore, we discuss persistent challenges and outline potential directions for future research.
Paper Type: Long
Research Area: Low-resource Methods for NLP
Research Area Keywords: quantization,distillation,NLP in resource-constrained settings
Contribution Types: Surveys
Languages Studied: english
Submission Number: 8405
Loading