Uncertainty - sensitive learning and planning with ensembles

Sep 25, 2019 Blind Submission readers: everyone Show Bibtex
  • Abstract: We propose a reinforcement learning framework for discrete environments in which an agent optimizes its behavior on two timescales. For the short one, it uses tree search methods to perform tactical decisions. The long strategic level is handled with an ensemble of value functions learned using $TD$-like backups. Combining these two techniques brings synergies. The planning module performs \textit{what-if} analysis allowing to avoid short-term pitfalls and boost backups of the value function. Notably, our method performs well in environments with sparse rewards where standard $TD(1)$ backups fail. On the other hand, the value functions compensate for inherent short-sightedness of planning. Importantly, we use ensembles to measure the epistemic uncertainty of value functions. This serves two purposes: a) it stabilizes planning, b) it guides exploration. We evaluate our methods on discrete environments with sparse rewards: the Deep sea chain environment, toy Montezuma's Revenge, and Sokoban. In all the cases, we obtain speed-up of learning and boost to the final performance.
  • Code: https://github.com/learningandplanningICLR/learningandplanning
  • Keywords: deep reinfocement learning, mcts, ensembles, uncertainty
  • Original Pdf:  pdf
0 Replies