Planning behavior in a recurrent neural network that plays Sokoban

Published: 24 Jun 2024, Last Modified: 31 Jul 2024ICML 2024 MI Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: planning, RL, mechanistic interpretability, recurrent neural network
TL;DR: We train an RNN with model-free RL, show that it starts planning early in training, and that it paces at episode starts to get time to think.
Abstract: To predict how advanced neural networks generalize to novel situations, it is essential to understand how they reason. Guez et al. (2019, "An investigation of model-free planning") trained a recurrent neural network (RNN) to play Sokoban with model-free reinforcement learning. They found that adding extra computation steps to the start of episodes at test time improves the RNN's success rate. We further investigate this phenomenon, finding that it rapidly emerges early on in training and then slowly fades, but only for comparatively easier levels. The RNN also often takes redundant actions at episode starts, and these are reduced by adding extra computation steps. Our results suggest that the RNN learns to take time to think by 'pacing', despite the per-step penalties, indicating that training incentivizes planning capabilities. The small size (1.29M parameters) and interesting behavior of this model make it an excellent model organism for mechanistic interpretability.
Submission Number: 135
Loading