Keywords: lookahead optimizer
TL;DR: We generalize the lookahead optimizer by quadratically approximation the loss within a trust region informed by last k steps and optimizing that approximation.
Abstract: The vast majority of deep learning models are trained using SGD or one of its
variants. Zhang et al. (2019) suggested the Lookahead optimiser as an alternative
which enjoys remarkable test performance on many established datasets and mod-
els. In this work we investigate a generalisation of this optimisation method. We
validate the method empirically, generally demonstrating better results and faster
convergence relative to the baselines of SGD and Lookahead
5 Replies
Loading