Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step exploration value. While there are a few model-based solutions to this difficulty, a model-free approach is still missing. We propose $E$-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using $E$-value improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to learn continuous MDPs.
TL;DR:We propose a generalization of visit-counters that evaluate the propagating exploratory value over trajectories, enabling efficient exploration for model-free RL