TL;DR: We propose a principled method for quantization-aware training (QAT) following a convex regularization framework.
Abstract: We develop a novel optimization method for quantization-aware training (QAT). Specifically, we show that convex, piecewise-affine regularization (PAR) can effectively induce neural network weights to cluster towards discrete values. We minimize PAR-regularized loss functions using an aggregate proximal stochastic gradient method (AProx) and prove that it enjoys last-iterate convergence. Our approach provides an interpretation of the straight-through estimator (STE), a widely used heuristic for QAT, as the asymptotic form of PARQ. We conduct experiments to demonstrate that PARQ obtains competitive performance on convolution- and transformer-based vision tasks.
Lay Summary: As modern AI models exhibit exceptional vision and language processing capabilities, but often come with excessive sizes and demands on memory and computing. Quantization is an effective approach for model compression, which can significantly reduce their memory footprint, computing cost, as well as the latency for inference. Existing methods for quantization-aware training heavily rely on heuristics and have weak convergence guarantees. We propose a new algorithms based on a rigorous optimization framework, which gives a principled interpretation of a widely used heuristic. In addition, it obtains competitive empirical performance and also enjoys strong convergence guarantees.
Primary Area: Optimization
Keywords: quantization-aware training, proximal gradient method, convex regularization, stochastic gradient method, last-iterate convergence
Submission Number: 9519
Loading