Uncertainty - sensitive learning and planning with ensembles

Piotr Miłoś; Łukasz Kuciński; Konrad Czechowski; Piotr Kozakowski; Maciej Klimek

Uncertainty - sensitive learning and planning with ensembles

Piotr Miłoś, Łukasz Kuciński, Konrad Czechowski, Piotr Kozakowski, Maciej Klimek

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: We propose a reinforcement learning framework for discrete environments in which an agent optimizes its behavior on two timescales. For the short one, it uses tree search methods to perform tactical decisions. The long strategic level is handled with an ensemble of value functions learned using $TD$-like backups. Combining these two techniques brings synergies. The planning module performs \textit{what-if} analysis allowing to avoid short-term pitfalls and boost backups of the value function. Notably, our method performs well in environments with sparse rewards where standard $TD(1)$ backups fail. On the other hand, the value functions compensate for inherent short-sightedness of planning. Importantly, we use ensembles to measure the epistemic uncertainty of value functions. This serves two purposes: a) it stabilizes planning, b) it guides exploration. We evaluate our methods on discrete environments with sparse rewards: the Deep sea chain environment, toy Montezuma's Revenge, and Sokoban. In all the cases, we obtain speed-up of learning and boost to the final performance.

Code: https://github.com/learningandplanningICLR/learningandplanning

Keywords: deep reinfocement learning, mcts, ensembles, uncertainty

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/uncertainty-sensitive-learning-and-planning/code)

Original Pdf: pdf

9 Replies

Loading