TL;DR: We conduct exploration using intrinsic rewards that are based on a weighted distance of nearest neighbors in representational space.
Abstract: We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on a weighted distance of nearest neighbors in the low dimensional representational space to gauge novelty.
We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space.
One key element of our approach is that we perform more gradient steps in-between every environment step in order to ensure the model accuracy. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
Keywords: Reinforcement Learning, Exploration
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2009.13579/code)
Original Pdf: pdf
7 Replies
Loading