Keywords: Adaptive Optimization, Linear Stability Analysis, Generalization, Loss Surface Geometry, Deep Neural Networks
Abstract: Adaptive optimization algorithms, such as Adam Kingma & Ba (2015) and RM-SProp Tieleman & Hinton (2012), have become integral to training deep neu-ral networks, yet their stability properties and impact on generalization remain poorly understood Wilson et al. (2017). This paper extends linear stability anal-ysis to adaptive optimizers, providing a theoretical framework that explains their behavior in relation to loss surface geometry Wu et al. (2022); Jastrz˛ebski et al.(2019). We introduce a novel generalized coherence measure that quantifies the interaction between the adaptive preconditioner and the Hessian of the loss func-tion. This measure yields necessary and sufficient conditions for linear stability near stationary points, offering insights into why adaptive methods may converge to sharper minima with poorer generalization.
Our analysis leads to practical guidelines for hyperparameter tuning, demon-strating how to improve the generalization performance of adaptive optimizers. Through extensive experiments on benchmark datasets and architectures, includ-ing ResNet He et al. (2016) and Vision Transformers Dosovitskiy et al. (2020), we validate our theoretical predictions, showing that aligning the adaptive precon-ditioner with the loss surface geometry through careful parameter selection can narrow the generalization gap between adaptive methods and SGD Loshchilov & Hutter (2018).
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13895
Loading