Abstract: Quantization enables efficient inference on resource-limited devices, yet training still depends on high-precision gradients and optimizer states.
We address this gap by introducing stochastic ternary momentum, a fully quantized optimizer that operates with quantized parameters, ternary gradient information, and enables ternary momentum states for stable and memory efficient quantized optimization.
Our method replaces deterministic and full-precision updates with integer-valued updates driven by stochastic sampling, ensuring that expected updates match standard momentum while maintaining strict memory constraints.
It eliminates re-quantization overhead and preserves quantization consistency throughout training.
We establish theoretical convergence guarantees of our ternary momentum method for convex objectives over bounded integer domains and for non-convex objectives over unbounded integer domains. Experiments on vision and language tasks demonstrate that our approach retains strong performance while reducing optimizer memory by 95\% compared to full-precision, advancing the feasibility of fully quantized training.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ran_Tian1
Submission Number: 6458
Loading