A Gradient Descent Optimizer with auto-controlled large Learning Rates, dynamic Batch Sizes and without Momentum

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning, ICRL, Optimization
TL;DR: We present a novel fast optimizer with self-adjusted learning rates and batch sizes, without momentum.
Abstract: We present a novel, fast gradient based momentum-free optimizer algorithm with dynamic learning rate and dynamic batch size. The main ideas are to exponentially adapt the learning rate $ \alpha $ by situational awareness, mainly striving for orthogonal neighboring gradients, and to increase the batch size when the gradients become too noisy, leading to random walks rather than gradient descent. The method has a high success and fast convergence rate and relies only on few hyper-parameters, providing greater universality. It scales only linearly (of order $O(n)$) with dimension and is rotation invariant, thereby overcoming known limitations. The optimization method is termed ELRA (Exponential Learning Rate Adaption). The impressive performance of ELRA is demonstrated by experiments on several benchmark data-sets (ranging from MNIST to ImageNet) against common optimizers such as Adam, Lion and SGD.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10502
Loading