Asymptotic Convergence of SGD in Non-Convex Problems: A Stopping Times Method with Relaxed Step-size Conditions

Ruinan Jin; Xin shi; ShaoDong Liu; Difei Cheng

Asymptotic Convergence of SGD in Non-Convex Problems: A Stopping Times Method with Relaxed Step-size Conditions

Ruinan Jin, Xin shi, ShaoDong Liu, Difei Cheng

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: stochastic optimization, convergence analyse

TL;DR: Explore the asymptotic convergence of Stochastic Gradient Descent (SGD) under more relaxed assumptions and step-size conditions.

Abstract: Stochastic Gradient Descent (SGD) is widely used in machine learning research. In previous research, the convergence analyses of SGD under vanishing step-size settings typically assumed that the step sizes satisfied the Robbins-Monro conditions, which is to say, the sum of the step sizes was infinite, while the sum of the squares of the step sizes was finite. In practical applications, a wider variety of step sizes is often used, but these may not meet the Robbins-Monro step-size conditions, thus lacking theoretical guarantees of convergence. To bridge the gap between theory and practical application, this paper introduces a novel analytical method—the stopping time method based on probability theory—to explore the asymptotic convergence of SGD under more relaxed step-size conditions. In the non-convex setting, we prove that the almost sure convergence of the sequence of iterates generated by SGD when step sizes satisfy \(\sum_{t=1}^{+\infty} \epsilon_t = +\infty\) and \(\sum_{t=1}^{+\infty} \epsilon_t^p < +\infty\) for some \(p > 2\). Compared to previous works, our analysis eliminates the need to assume global Lipschitz continuity of the loss function, and it also relaxes the requirement of global boundedness of the high-order moments of the stochastic gradient to local boundedness. Additionally, we prove \(L_2\) convergence without the need for assuming global boundedness of loss functions or their gradients. The assumptions required for this work are the weakest among studies with the same conclusions, thereby extending the applicability of SGD in various practical scenarios where traditional assumptions may not hold.

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6693

Loading