Abstract: The bounded variance, gradient Lipschitz, and unbiased stochastic gradient are three key assumptions for ensuring the convergence and generalization of stochastic methods, especially in nonconvex scenarios. However, it is important to acknowledge that in practical applications, one or more of these assumptions might be violated, which is the main focus of this paper. In this study, we aim to demonstrate that by incorporating simple gradient normalization with momentum, SGD can effectively guarantee convergence and generalization, even in the presence of unbounded noise, weak gradient Lipschitz, and biased stochastic gradient caused by delays. These results significantly broaden the range of applications for stochastic algorithms, as they relax the previous assumptions and provide more flexibility in real-world scenarios.
External IDs:dblp:journals/pami/SunSL25
Loading