Variance-reduced reshuffling gradient descent for nonconvex optimization: Centralized and distributed algorithms

Xia Jiang, Xianlin Zeng, Lihua Xie, Jian Sun, Jie Chen

Published: 01 Jan 2025, Last Modified: 15 Nov 2024Autom. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Nonconvex finite-sum optimization plays a crucial role in signal processing and machine learning, fueling the development of numerous centralized and distributed stochastic algorithms. However, existing stochastic optimization algorithms often suffer from high stochastic gradient variance due to the use of random sampling with replacement. To address this issue, this paper introduces an explicit variance-reduction step and proposes variance-reduced reshuffling gradient algorithms with a sampling-without-replacement scheme. Specifically, this paper proves that the proposed centralized variance-reduced reshuffling gradient algorithm (VR-RG) with constant step sizes converges to a stationary point for nonconvex optimization under the Kurdyka–Łojasiewicz condition. Moreover, for nonconvex optimization over connected multi-agent networks, the proposed distributed variance-reduced reshuffling gradient algorithm (DVR-RG) converges to a neighborhood of stationary points, where the neighborhood can be made arbitrarily small under mild conditions. Notably, the proposed DVR-RG requires only one communication round at each epoch. Finally, numerical simulations demonstrate the efficiency of the proposed algorithms.