everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Stochastic Gradient Descent (SGD) is widely used in machine learning research. In previous research, the convergence analyses of SGD under vanishing step-size settings typically assumed that the step sizes satisfied the Robbins-Monro conditions, which is to say, the sum of the step sizes was infinite, while the sum of the squares of the step sizes was finite. In practical applications, a wider variety of step sizes is often used, but these may not meet the Robbins-Monro step-size conditions, thus lacking theoretical guarantees of convergence. To bridge the gap between theory and practical application, this paper introduces a novel analytical method—the stopping time method based on probability theory—to explore the asymptotic convergence of SGD under more relaxed step-size conditions. In the non-convex setting, we prove that the almost sure convergence of the sequence of iterates generated by SGD when step sizes satisfy (\sum_{t=1}^{+\infty} \epsilon_t = +\infty) and (\sum_{t=1}^{+\infty} \epsilon_t^p < +\infty) for some (p > 2). Compared to previous works, our analysis eliminates the need to assume global Lipschitz continuity of the loss function, and it also relaxes the requirement of global boundedness of the high-order moments of the stochastic gradient to local boundedness. Additionally, we prove (L_2) convergence without the need for assuming global boundedness of loss functions or their gradients. The assumptions required for this work are the weakest among studies with the same conclusions, thereby extending the applicability of SGD in various practical scenarios where traditional assumptions may not hold.