Improved Diversity in Nested Rollout Policy Adaptation

Stefan Edelkamp, Tristan Cazenave

Published: 2016, Last Modified: 15 May 2023KI 2016Readers: Everyone

Abstract: For combinatorial search in single-player games nested Monte-Carlo search is an apparent alternative to algorithms like UCT that are applied in two-player and general games. To trade exploration with exploitation the randomized search procedure intensifies the search with increasing recursion depth. If a concise mapping from states to actions is available, the integration of policy learning yields nested rollout with policy adaptation (NRPA), while Beam-NRPA keeps a bounded number of solutions in each recursion level. In this paper we propose refinements for Beam-NRPA that improve the runtime and the solution diversity.

0 Replies