Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Optimizer, Line Search, Learning Rate, Transformer, CNN
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In recent studies, line search methods have been demonstrated to significantly
enhance the performance of conventional stochastic gradient descent techniques
across various datasets and architectures, while making an otherwise critical choice
of learning rate schedule superfluous Vaswani et al. (2019); Mahsereci & Hennig
(2015); Vaswani et al. (2021). In this paper, we identify problems of current state-of-the-art of line search methods Vaswani et al. (2019; 2021), propose enhancements,
and rigorously assess their effectiveness. Furthermore, we evaluate these methods
on orders of magnitude larger datasets and more complex data domains than
previously done.
More specifically, we enhance the Armijo line search method by speeding up
its computation and incorporating a momentum term into the Armijo criterion,
making it better suited for stochastic mini-batching. Our optimization approach
outperforms both the previous Armijo implementation and a tuned learning rate
schedule for the Adam and SGD optimizers. Our evaluation covers a diverse range
of architectures, such as Transformers, CNNs, and MLPs, as well as data domains,
including NLP and image data.
Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1753
Loading