Keywords: Meta-Learning, Reinforcement Learning, Learned Optimization, Deep Learning
TL;DR: We propose OPEN, a method for learning optimizers designed to improve final return in reinforcement learning by tackling the difficulties of plasticity loss, exploration and non-stationarity.
Abstract: While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whether learned optimization can help overcome these problems. Our method, Learned **O**ptimization for **P**lasticity, **E**xploration and **N**on-stationarity (*OPEN*), meta-learns an update rule whose input features and output structure are informed by previously proposed solutions to these difficulties. We show that our parameterization is flexible enough to enable meta-learning in diverse learning contexts, including the ability to use stochasticity for exploration. Our experiments demonstrate that when meta-trained on single and small sets of environments, *OPEN* outperforms or equals traditionally used optimizers. Furthermore, *OPEN* shows strong generalization across a *distribution of environments* and a range of agent architectures.
Submission Number: 6
Loading