Reverse engineering learned optimizers reveals known and novel mechanismsDownload PDF

21 May 2021, 20:47 (edited 24 Jan 2022)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: optimization, learned optimizers, reverse engineering, RNNs
  • TL;DR: We reverse engineer learned optimizers trained on a simple tasks and show that they learn interpretable and intuitive mechanisms.
  • Abstract: Learned optimizers are parametric algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to better performance, their inner workings remain a mystery. How is a given learned optimizer able to outperform a well tuned baseline? Has it learned a sophisticated combination of existing optimization techniques, or is it implementing completely new behavior? In this work, we address these questions by careful analysis and visualization of learned optimizers. We study learned optimizers trained from scratch on four disparate tasks, and discover that they have learned interpretable behavior, including: momentum, gradient clipping, learning rate schedules, and new forms of learning rate adaptation. Moreover, we show how dynamics and mechanisms inside of learned optimizers orchestrate these computations. Our results help elucidate the previously murky understanding of how learned optimizers work, and establish tools for interpreting future learned optimizers.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/google-research/reverse-engineering-neural-networks
15 Replies

Loading