High probability bounds on AdaGrad for constrained weakly convex optimization

Yusu Hong, Junhong Lin

Published: 01 Jan 2025, Last Modified: 28 Sept 2024J. Complex. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we study the high probability convergence of AdaGrad-Norm for constrained, non-smooth, weakly convex optimization with bounded noise and sub-Gaussian noise cases. We also investigate a more general accelerated gradient descent (AGD) template (Ghadimi and Lan, 2016) encompassing the AdaGrad-Norm, the Nesterov's accelerated gradient descent, and the RSAG (Ghadimi and Lan, 2016) with different parameter choices. We provide a high probability convergence rate O˜(1/T)<math><mover accent="true" is="true"><mrow is="true"><mi mathvariant="script" is="true">O</mi></mrow><mrow is="true"><mo stretchy="false" is="true">˜</mo></mrow></mover><mo stretchy="false" is="true">(</mo><mn is="true">1</mn><mo stretchy="false" is="true">/</mo><msqrt is="true"><mrow is="true"><mi is="true">T</mi></mrow></msqrt><mo stretchy="false" is="true">)</mo></math> without knowing the information of the weak convexity parameter and the gradient bound to tune the step-sizes.