Keywords: Exploration, hierarchical RL, planning, option discovery
Abstract: We seek to design reinforcement learning agents that build plannable models of the world that are abstract in both state and time. We propose a new algorithm to construct a skill graph; nodes in the skill graph represent abstract states and edges represent skill policies. Previous works that learn a skill graph use random sampling from the state-space and nearest-neighbor search: operations that are infeasible in environments with high-dimensional observations (for example, images). Furthermore, previous algorithms attempt to increase the probability of all edges (by repeatedly executing the corresponding skills) so that the resulting graph is robust and reliable everywhere. However, exhaustive coverage is infeasible in large environments, and agents should prioritize practicing skills that are more likely to result in higher reward. We show that our agent can solve challenging image-based exploration problems more rapidly than vanilla model-free RL and state-of-the-art novelty-based exploration; then, we show that the resulting abstract model solve a family of tasks not provided during the agent's exploration phase.
Submission Number: 232
Loading