Stab-SGD: Noise-Adaptivity in Smooth Optimization with Stability Ratios

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: learning theory, noise-adaptive hyperparameters, schedule-free gradient descent, smooth last-iterate optimization
TL;DR: Estimating the stability of stochastic gradient estimates provably yields an efficient and horizon-free adaptive algorithm with noise-indepent hyperparameters.
Abstract: In the context of smooth stochastic optimization with first order methods, we introduce the stability ratio of gradient estimates, as a measure of local relative noise level, from zero for pure noise to one for negligible noise. We show that a schedule-free variant (Stab-SGD) of stochastic gradient descent obtained by just shrinking the learning rate by the stability ratio achieves real adaptivity to noise levels (i.e. without tuning hyperparameters to the gradient's variance), with all key properties of a good schedule-free algorithm: neither plateau nor explosion at intialization, and no saturation of the loss. We believe this theoretical development reveals the importance of estimating the local stability ratio in the construction of well-behaved (last-iterate) schedule-free algorithms, particularly when hyperparameter-tuning budgets are a small fraction of the total budget since noise-adaptivity and cheaper horizon-free tuning are most crucial in this regime.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 12129
Loading