In this paper, we investigate the novel optimizer Lion (Evolved Sign Momentum), which demonstrates superior performance compared to the well-established Adam in a wide range of tasks. Lion is a combination of Sign Gradient Descent (SignGD) and momentum, utilizing a fixed step size and adjusting the gradient direction via a sign operation. Despite its promising results, Lion currently lacks comprehensive theoretical justification. We also discuss Normalized Gradient Descent methods, characterized by a fixed step size, which predate Lion. We show that both Lion and NormGD have notable disadvantages, and to address these issues, we propose a new method SepNorm, which normalizes gradients across different parameter groups. SepNorm generalizes both Lion and NormGD, offering a more adaptable and stable optimization approach. Our theoretical analysis on quadratic functions reveals mechanisms of convergence behind the methods and allows us to formulate implicit bias criteria for them. Additionally, we introduce OrtSepNorm, an extension of SepNorm that makes update direction orthogonal to the weights, and we demonstrate that OrtSepNorm converges to a fixed weight norm, thereby making the training process more stable. Empirical evaluations reveal that SepNorm and OrtSepNorm outperform both Lion and Adam in a range of computer vision (CV) and natural language processing (NLP) tasks.
Keywords: Optimization, Lion, Deep Learning
Abstract:
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11027
Loading