Ternary Momentum For Quantized Training

Published: 30 Mar 2026, Last Modified: 30 Mar 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Quantization enables efficient inference on resource-limited devices, yet training still depends on high-precision gradients and optimizer states. We address this gap by introducing stochastic ternary momentum, a fully quantized optimizer that operates with quantized parameters, ternary gradient information, and enables ternary momentum states for stable and memory efficient quantized optimization. Our method replaces deterministic and full-precision updates with integer-valued updates driven by stochastic sampling, ensuring that expected updates match standard momentum while maintaining strict memory constraints. It eliminates re-quantization overhead and preserves quantization consistency throughout training. We establish theoretical convergence guarantees of our ternary momentum method for convex objectives over bounded integer domains and for non-convex objectives over unbounded integer domains. Experiments on vision and language tasks demonstrate that our approach retains strong performance while reducing optimizer memory by 95\% compared to full-precision, advancing the feasibility of fully quantized training.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ran_Tian1
Submission Number: 6458
Loading