Last-iterate convergence analysis of stochastic momentum methods for neural networks

Jinlan Liu, Dongpo Xu, Yinghua Lu, Jun Kong, Danilo P. Mandic

Published: 2023, Last Modified: 04 Nov 2025Neurocomputing 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The stochastic momentum method is a commonly used acceleration technique for solving large-scale stochastic optimization problems. Current convergence results of stochastic momentum methods under non-convex stochastic settings mostly discuss convergence in terms of the random output and minimum output, which requires temporal and spatial statistics of historical data. On the other hand, the last-iterate convergence allows us to avoid storing or selecting past output iterates after each iteration, while maintaining rigour in convergence analysis. To this end, we address the convergence of the last iterate output (called last-iterate convergence) of the stochastic momentum methods for non-convex stochastic optimization problems, in a way which is conformal with traditional optimization theory. For generality, we prove the last-iterate convergence of the stochastic momentum methods under a unified framework, covering both stochastic heavy ball momentum and stochastic Nesterov accelerated gradient momentum, whose momentum factors can be either constant or time-varying coefficients. Finally, the last-iterate convergence of the stochastic momentum methods is verified on the benchmark MNIST and CIFAR-10 datasets. The implementation of SUM is available at: https://github.com/xudp100/SUMhttps://github.com/xudp100/SUM.