Keywords: Robust Convergence, AdaGrad, Last-iterate, almost sure, any stopping time
Abstract: AdaGrad has become a widely used algorithm for training deep models. Recently, the study of almost sure last-iterate convergence rates in stochastic optimization has attracted increasing attention, as it provides the guarantee of stability and **robustness** for arbitrary single trajectory. While such results are well understood for stochastic gradient descent (SGD), the corresponding analysis for AdaGrad remains limited. In this paper, we establish **almost sure** convergence rates of AdaGrad for the **last-iterate** in the (strongly) convex setting and for the best-iterate in the non-convex setting, both valid under **arbitrary** stopping times and with a flexible dependence on gradient history.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 9313
Loading