Structure and randomness in planning and reinforcement learning

Piotr Kozakowski; Piotr Januszewski; Konrad Czechowski; Łukasz Kuciński; Piotr Miłoś

Structure and randomness in planning and reinforcement learning

Piotr Kozakowski, Piotr Januszewski, Konrad Czechowski, Łukasz Kuciński, Piotr Miłoś

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, uncertainty, model-based, MCTS

Abstract: Planning in large state spaces inevitably needs to balance depth and breadth of the search. It has a crucial impact on planners performance and most manage this interplay implicitly. We present a novel method $\textit{Shoot Tree Search (STS)}$, which makes it possible to control this trade-off more explicitly. Our algorithm can be understood as an interpolation between two celebrated search mechanisms: MCTS and random shooting. It also lets the user control the bias-variance trade-off, akin to $TD(n)$, but in the tree search context. In experiments on challenging domains, we show that STS can get the best of both worlds consistently achieving higher scores.

One-sentence Summary: We present a novel planning algorithm based on modern version of MCTS (with neural-network heuristics).

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=ciNF7I-kFD

16 Replies

Loading