Generalised Lookahead OptimiserDownload PDF

01 Mar 2023 (modified: 31 May 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone
Keywords: lookahead optimizer
TL;DR: We generalize the lookahead optimizer by quadratically approximation the loss within a trust region informed by last k steps and optimizing that approximation.
Abstract: The vast majority of deep learning models are trained using SGD or one of its variants. Zhang et al. (2019) suggested the Lookahead optimiser as an alternative which enjoys remarkable test performance on many established datasets and mod- els. In this work we investigate a generalisation of this optimisation method. We validate the method empirically, generally demonstrating better results and faster convergence relative to the baselines of SGD and Lookahead
5 Replies

Loading