Keywords: Exploration, Discrete representation, Intrinsic reward
Abstract: Exploration remains an open problem in reinforcement learning. Ideally, a useful exploration method should efficiently explore sparse reward environments, scale to large environments, and be simple to implement. Most agents use random-based methods like ε-greedy due to their simplicity and low computational cost. However, these methods struggle in sparse reward settings and take a long time to converge. Count-based methods encourage exploration in less-visited areas but do not scale well to large environments. Extensions to count-based methods to work with function approximation improve performance in complex environments like Montezuma’s Revenge, but are rarely used due to their computational and implementation complexity. We propose a new method that achieves all three desiderata simultaneously. Our exploration method handles large environments by maintaining counts within multiple overlapping partitions of states to derive exploration bonuses. We evaluate our algorithms on three continuous observation environments where count-based methods cannot be applied, including MiniGrid DoorKey with image-based observations.
Submission Number: 9
Loading