Keywords: Optimization, quantization, large language models
TL;DR: We introduce LOTION, a principled smoothing framework that replaces the raw, discontinuous quantized loss with its expectation under unbiased stochastic-rounding noise
Abstract: Optimizing neural networks to minimize quantized loss is difficult as
the quantized loss surface is discontinuous. Most previous methods
deal with this issue by relaxing gradient computations by using
techniques like Straight Through Estimators (STE). However, these
algorithms do not provide any guarantees of convergence. In this work,
taking inspiration from Nesterov smoothing, we relax the
loss function by approximating the quantized loss surface with a
smoothed loss, where we consider an expected quantized loss
after perturbing the weights with random noise.
In particular, we introduce LOTION, a principled smoothing framework that replaces the raw quantized loss with its expectation under unbiased stochastic-rounding noise.
In this framework, standard
optimizers are guaranteed to converge to a local minimum of the
smoothed loss surface. Moreover, when using noise derived from
stochastic rounding, we show that the global minima of the original
quantized loss are preserved. We empirically demonstrate that this
method outperforms QAT on synthetic testbeds and in large language model experiments.
Submission Number: 104
Loading