Width-Based Lookaheads Augmented with Base Policies for Stochastic Shortest Paths

Anonymous

Width-Based Lookaheads Augmented with Base Policies for Stochastic Shortest Paths

Anonymous

18 Mar 2019 (modified: 05 May 2023)ICAPS 2019 Workshop HSDIP Blind SubmissionReaders: Everyone

Keywords: width-based planning, finite-horizon MDPs, rollout algorithm, base policies

TL;DR: We propose a new Monte Carlo Tree Search / rollout algorithm that relies on width-based search to construct a lookahead.

Abstract: Sequential decision problems for real-world applications often need to be solved in real-time, requiring algorithms to perform well with a restricted computational budget. Width-based lookaheads have shown state-of-the-art performance in classical planning problems as well as over the Atari games with tight budgets. In this work we investigate width-based lookaheads over Stochastic Shortest paths (SSP). We analyse why width-based algorithms perform poorly over SSP problems, and overcome these pitfalls proposing a method to estimate costs-to-go. We formalize width-based lookaheads as an instance of the rollout algorithm, give a definition of width for SSP problems and explain its sample complexity. Our experimental results over a variety of SSP benchmarks show the algorithm to outperform other state-of-the-art rollout algorithms such as UCT and RTDP.

18 Replies

Loading