Keywords: Reinforcement learning, Intrinsic motivation, Junction States, Information theory, Heuristic, Exploration
TL;DR: We propose a junction state estimation heuristic that is computationally efficient, increases exploration efficiency and helps in learning robotic manipulation downstream tasks.
Abstract: Exploration is one of the important bottlenecks for efficient learning in reinforcement learning, especially in the presence of sparse rewards. One way to traverse the environment faster is by passing through junctions, or metaphorical doors, in the state space. We propose a novel heuristic, $Door(s)$, focused on such narrow passages that serve as pathways to a large number of other states. Our approach works by estimating the state occupancy distribution and allows computation of its entropy, which forms the basis for our measure. Its computation is more sample-efficient compared to other similar methods and robustly works over longer horizons. Our results highlight the detection of dead-end states, show increased exploration efficiency, and demonstrate that $Door(s)$ encodes specific behaviors useful for downstream learning of various robotic manipulation tasks.
Supplementary Material: zip
Spotlight: zip
Submission Number: 362
Loading