Asymptotic theory of SGD with a general learning-rate

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Learning Rate Schedules, Convergence Analysis, Stochastic Gradient Descent, Online Learning
Abstract: Stochastic gradient descent (SGD) with polynomially decaying step‐sizes has long underpinned theoretical analyses, yielding a broad spectrum of statistically attractive guarantees. Yet in practice, such schedules find rare use due to their prohibitively slow convergence, revealing a persistent gap between theory and empirical performance. In this paper, we introduce a unified framework that quantifies the uncertainty of online SGD under arbitrary learning‐rate choices. In particular, we provide the first comprehensive convergence characterizations for two widely used but theoretically under-examined schemes—cyclical learning rates and linear decay to zero. Our results not only explain the observed behavior of these schedules but also facilitate principled tools for statistical inference and algorithm design. All theoretical findings are corroborated by extensive simulations across diverse settings.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 22779
Loading