Learning Dynamics and the Geometry of Neural Dynamics in Recurrent Neural Controllers

Published: 07 Jun 2024, Last Modified: 07 Jun 2024InterpPol @RLC-2024 CorrectpaperthatfitsthetopicEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep reinforcement learning; recurrent neural network; learning dynamics; dynamical systems
Abstract: Recurrent Neural Networks (RNNs) are versatile and widely used models of complex decision-making behavior across many fields including artificial intelligence (AI) and neuroscience. Understanding how RNNs learn to perform complex tasks through interaction with an environment, i.e., as agents or controllers, is therefore broadly important. A lot of previous work has analyzed RNNs trained using supervised learning, and relatively less attention has been paid to reinforcement learning (RL) in the context of recurrent architectures and their learning dynamics. Here, we take a step towards addressing this gap by thoroughly analyzing the learning dynamics of RNN-based artificial agents trained by reinforcement to solve a classic nonlinear continuous control problem—the Inverted Pendulum. Our analysis found that training gradually sharpened the policy landscape and pruned the recurrent dynamics into a ring to efficiently represent the angle between the pendulum and its goal location, a circular variable. During training, a stable fixed point (FP) emerged and moved across the state space until it approached the goal location. Furthermore, the FP’s proximity to the goal location was significantly correlated with the reward obtained by the controller, providing a direct link between the RNN’s representational geometry and the agent’s task performance. The memory capacity of the agent, quantified by its stimulus integration time, exhibited distinct regimes in its evolution over training. Our framework provides key intuitions on the evolution of the control policy, neural dynamics, representational geometry, and memory in RNN-based agents. In future work, we will extend this framework to investigate more complex environments, with longer evidence integration memory requirements, and more complex sequential decision making and planning.
Submission Number: 5
Loading