- Abstract: We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models. A natural way to tackle this problem is to replace the human-designed optimizer by an LSTM network and train the parameters on some simple optimization problems (Andrychowicz et al., 2016). Despite their success compared to traditional optimizers such as SGD on a short horizon, theselearnt (meta-) optimizers suffer from two key deficiencies: they fail to converge(or can even diverge) on a longer horizon (e.g., 10000 steps). They also often fail to generalize to new tasks. To address the convergence problem, we rethink the architecture design of the meta-optimizer and develop an embarrassingly simple,yet powerful form of meta-optimizers—a coordinate-wise RNN model. We provide insights into the problems with the previous designs of each component and re-design our SimpleOptimizer to resolve those issues. Furthermore, we propose anew mechanism to allow information sharing between coordinates which enables the meta-optimizer to exploit second-order information with negligible overhead.With these designs, our proposed SimpleOptimizer outperforms previous meta-optimizers and can successfully converge to optimal solutions in the long run.Furthermore, our empirical results show that these benefits can be obtained with much smaller models compared to the previous ones.
- Code: https://anonymous.4open.science/r/969ae045-a211-4138-bd2d-d9a5d99192af/
- Original Pdf: pdf