Abstract: In stochastic optimization, a widely used approach for handling large samples sequentially is the stochastic gradient algorithm (SGD). However, a key limitation of SGD is that its step size sequence remains uniform across all gradient directions, which can lead to poor performance in practice, particularly for ill-conditioned problems. To address this issue, adaptive gradient algorithms, such as Adagrad and stochastic Newton methods, have been developed. These algorithms adapt the step size to each gradient direction, providing significant advantages in such challenging settings. This paper focuses on the non-asymptotic analysis of these adaptive gradient algorithms for strongly convex objective functions. The theoretical results are further applied to practical examples, including linear regression and regularized generalized linear models, using both Adagrad and stochastic Newton algorithms.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: A version of this submission were rejected because it was not anonymous. We are not sure that we should send the link.
Assigned Action Editor: ~Robert_M._Gower1
Submission Number: 3986
Loading