Training Graph Neural Networks with Policy Gradients to Perform Tree Search

Matthew Macfarlane; Diederik M Roijers; Herke van Hoof

Training Graph Neural Networks with Policy Gradients to Perform Tree Search

Matthew Macfarlane, Diederik M Roijers, Herke van Hoof

08 Oct 2022 (modified: 05 May 2023)Deep RL Workshop 2022Readers: Everyone

Keywords: Reinforcement Learning, Tree Search, Planning, Graph Neural Networks, Monte Carlo Tree Search

TL;DR: This paper recognises tree search heuristics can be represented with graph networks and investigates how tree search policies can be learnt, by parameterising search policies using graph neural networks and training them with Reinforcement Learning.

Abstract: Monte Carlo Tree Search has been shown to be a well-performing approach for decision problems such as board games and Atari games, but relies on heuristic design decisions that are non-adaptive and not necessarily optimal for all problems. Learned policies and value functions can augment MCTS by leveraging the state information at the nodes in the search tree. However, these learned functions do not take the search tree structure into account and can be sensitive to value estimation errors. In this paper, we propose a new method that, using Reinforcement Learning, learns how to expand the search tree and make decisions using Graph Neural Networks. This enables the policy to fully leverage the search tree and learn how to search based on the specific problem. Firstly, we show in an environment where state information is limited that the policy is able to leverage information from the search tree. Concluding, we find that the method outperforms popular baselines on two diverse and problems known to require planning: Sokoban and the Travelling salesman problem.

Supplementary Material: zip

0 Replies

Loading