Trust, but verify: model-based exploration in sparse reward environments

Konrad Czechowski; Tomasz Odrzygóźdź; Michał Izworski; Marek Zbysiński; Łukasz Kuciński; Piotr Miłoś

Trust, but verify: model-based exploration in sparse reward environments

Konrad Czechowski, Tomasz Odrzygóźdź, Michał Izworski, Marek Zbysiński, Łukasz Kuciński, Piotr Miłoś

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, model-based, exploration, on-line planning, imperfect environment model

Abstract: We propose $\textit{trust-but-verify}$ (TBV) mechanism, a new method which uses model uncertainty estimates to guide exploration. The mechanism augments graph search planning algorithms by the capacity to deal with learned model's imperfections. We identify certain type of frequent model errors, which we dub $\textit{false loops}$, and which are particularly dangerous for graph search algorithms in discrete environments. These errors impose falsely pessimistic expectations and thus hinder exploration. We confirm this experimentally and show that TBV can effectively alleviate them. TBV combined with MCTS or Best First Search forms an effective model-based reinforcement learning solution, which is able to robustly solve sparse reward problems.

One-sentence Summary: We address exploration problems arising from on-line planning with learned environment models.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): /references/pdf?id=r571HjVJ8V

9 Replies

Loading