Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

Antoine Godichon-Baggioni; Pierre Tarrago

Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

Antoine Godichon-Baggioni, Pierre Tarrago

Published: 10 Jun 2025, Last Modified: 10 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In stochastic optimization, a widely used approach for handling large samples sequentially is the stochastic gradient algorithm (SGD). However, a key limitation of SGD is that its step size sequence remains uniform across all gradient directions, which can lead to poor performance in practice, particularly for ill-conditioned problems. To address this issue, adaptive gradient algorithms, such as Adagrad and stochastic Newton methods, have been developed. These algorithms adapt the step size to each gradient direction, providing significant advantages in such challenging settings. This paper focuses on the non-asymptotic analysis of these adaptive gradient algorithms for strongly convex objective functions. The theoretical results are further applied to practical examples, including linear regression and regularized generalized linear models, using both Adagrad and stochastic Newton algorithms.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We did a last carefull reading of the proofs and remove the text in red.

Code: https://godichon.perso.math.cnrs.fr/Codes_Newton_TMLR.zip

Assigned Action Editor: ~Robert_M._Gower1

Submission Number: 3986

Loading