SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

TMLR Paper5972 Authors

23 Sept 2025 (modified: 28 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named \textbf{SANIA} to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including \textit{Hutchinson's method}, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yunwen_Lei1
Submission Number: 5972
Loading