Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: second order optimizer, hardware efficiency, approximate preconditioning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose Jorge, a new second order optimizer that is extremely efficient on GPU platforms.
Abstract: Despite their better convergence properties compared to first-order optimizers,
second-order optimizers for deep learning have been less popular due to their
significant computational costs. The primary efficiency bottleneck in such
optimizers is matrix inverse calculations in the preconditioning step, which
are expensive to compute on GPUs. In this paper, we introduce Jorge, a
second-order optimizer that promises the best of both worlds -- rapid
convergence benefits of second-order methods, and high computational efficiency
typical of first-order methods. We address the primary computational bottleneck
of computing matrix inverses by completely eliminating them using an
approximation of the preconditioner computation. This makes Jorge extremely
efficient on GPUs in terms of wall-clock time. Further, we describe an approach
to determine Jorge's hyperparameters directly from a well-tuned SGD baseline,
thereby significantly minimizing tuning efforts. Our empirical evaluations
demonstrate the distinct advantages of using Jorge, outperforming
state-of-the-art optimizers such as SGD, AdamW, and Shampoo across multiple
deep learning models, both in terms of sample efficiency and wall-clock time.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8677
Loading