AdaGrad Converges in a Robust Sense: Almost Sure Last-Iterate Rates under Any Stopping Time

Weiming Ou; Zhihao Liu; Xiao Wang

AdaGrad Converges in a Robust Sense: Almost Sure Last-Iterate Rates under Any Stopping Time

Weiming Ou, Zhihao Liu, Xiao Wang

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robust Convergence, AdaGrad, Last-iterate, almost sure, any stopping time

Abstract: AdaGrad has become a widely used algorithm for training deep models. Recently, the study of almost sure last-iterate convergence rates in stochastic optimization has attracted increasing attention, as it provides the guarantee of stability and **robustness** for arbitrary single trajectory. While such results are well understood for stochastic gradient descent (SGD), the corresponding analysis for AdaGrad remains limited. In this paper, we establish **almost sure** convergence rates of AdaGrad for the **last-iterate** in the (strongly) convex setting and for the best-iterate in the non-convex setting, both valid under **arbitrary** stopping times and with a flexible dependence on gradient history.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 9313

Loading