Reverse engineering learned optimizers reveals known and novel mechanismsDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: learned optimizers, optimization, recurrent neural networks, RNNs, interpretability
Abstract: Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from intuitive principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to optimizers with better performance in certain settings, their inner workings remain a mystery. How is it that a learned optimizer is able to outperform a well tuned baseline? Has it learned a sophisticated method for combining existing optimization techniques, or is it implementing completely new behavior? In this work, we address these questions by visualizing and understanding learned optimizers. We study learned optimizers trained from scratch on three disparate tasks, and discovered that they have learned interpretable mechanisms, including: momentum, gradient clipping, schedules, and a new form of learning rate adaptation. Moreover, we show how the dynamics of trained learned optimizers enables these behaviors. Our results elucidate the previously murky understanding of what learned optimizers learn, and establishes tools for interpreting future learned optimizers.
One-sentence Summary: We demonstrate that learned optimizers, parameterized by recurrent networks, learn interpretable mechanisms (momentum, gradient clipping, schedules, and learning rate adaptation).
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=mt8HjG1Nv
15 Replies

Loading