LOTION: Smoothing the Optimization Landscape for Quantized Training

Published: 22 Sept 2025, Last Modified: 02 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Optimization, quantization, large language models
TL;DR: We introduce LOTION, a principled smoothing framework that replaces the raw, discontinuous quantized loss with its expectation under unbiased stochastic-rounding noise
Abstract: Optimizing neural networks to minimize quantized loss is difficult as the quantized loss surface is discontinuous. Most previous methods deal with this issue by relaxing gradient computations by using techniques like Straight Through Estimators (STE). However, these algorithms do not provide any guarantees of convergence. In this work, taking inspiration from Nesterov smoothing, we relax the loss function by approximating the quantized loss surface with a smoothed loss, where we consider an expected quantized loss after perturbing the weights with random noise. In particular, we introduce LOTION, a principled smoothing framework that replaces the raw quantized loss with its expectation under unbiased stochastic-rounding noise. In this framework, standard optimizers are guaranteed to converge to a local minimum of the smoothed loss surface. Moreover, when using noise derived from stochastic rounding, we show that the global minima of the original quantized loss are preserved. We empirically demonstrate that this method outperforms QAT on synthetic testbeds and in large language model experiments.
Submission Number: 104
Loading