THE RL PERCEPTRON: DYNAMICS OF POLICY LEARNING IN HIGH DIMENSIONSDownload PDF

Published: 03 Mar 2023, Last Modified: 31 Mar 2023Physics4ML PosterReaders: Everyone
Keywords: Statistical physics of learning, Generalization models, Reinforcement learning, Reinforce policy gradient
Abstract: Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst case analyses, and fundamental questions remain about the dynamics of policy learning in high dimensional settings. Here we propose a simple high-dimensional model of RL and derive its typical dynamics as a set of closed-form ODEs. We show that the model exhibits rich behavior including delayed learning under sparse rewards; a speed-accuracy trade-off depending on reward stringency; and a dependence of learning regime on reward baselines. These results offer a first step toward understanding policy gradient methods in high dimensional settings.
0 Replies

Loading