Keywords: Monte Carlo Tree Search, knowledge graph, reinforcement learning
TL;DR: We developed an agent that learns to walk over a graph by modeling the Q-network, the policy network, and the value network, which are combined together with a Monte Carlo Tree Search (MCTS) to search for the target node.
Abstract: We consider the problem of learning to walk over a graph towards a target node for a given input query and a source node (e.g., knowledge graph reasoning). We propose a new method called ReinforceWalk, which consists of a deep recurrent neural network (RNN) and a Monte Carlo Tree Search (MCTS). The RNN encodes the history of observations and map it into the Q-value, the policy and the state value. The MCTS is combined with the RNN policy to generate trajectories with more positive rewards, overcoming the sparse reward problem. Then, the RNN policy is updated in an off-policy manner from these trajectories. ReinforceWalk repeats these steps to learn the policy. At testing stage, the MCTS is also combined with the RNN to predict the target node with higher accuracy. Experiment results show that we are able to learn better policies from less number of rollouts compared to other methods, which are mainly based on policy gradient method.