Improving Exploration in UCT Using Local Manifolds

Sriram Srinivasan, E. Talvitie, Michael H. Bowling

2015 (modified: 13 May 2021)AAAI 2015Readers: Everyone

Abstract: Monte Carlo planning has been proven successful in many sequential decision-making settings, but it suffers from poor exploration when the rewards are sparse. In this paper, we improve exploration in UCT by generalizing across similar states using a given distance metric. When the state space does not have a natural distance metric, we show how we can learn a local manifold from the transition graph of states in the near future. to obtain a distance metric. On domains inspired by video games, empirical evidence shows that our algorithm is more sample efficient than UCT, particularly when rewards are sparse.

0 Replies