- TL;DR: We learn a task-agnostic world graph abstraction of the environment and show how using it for structured exploration can significantly accelerate downstream task-specific RL.
- Abstract: Efficiently learning to solve tasks in complex environments is a key challenge for reinforcement learning (RL) agents. We propose to decompose a complex environment using a task-agnostic world graphs, an abstraction that accelerates learning by enabling agents to focus exploration on a subspace of the environment.The nodes of a world graph are important waypoint states and edges represent feasible traversals between them. Our framework has two learning phases: 1) identifying world graph nodes and edges by training a binary recurrent variational auto-encoder (VAE) on trajectory data and 2) a hierarchical RL framework that leverages structural and connectivity knowledge from the learned world graph to bias exploration towards task-relevant waypoints and regions. We show that our approach significantly accelerates RL on a suite of challenging 2D grid world tasks: compared to baselines, world graph integration doubles achieved rewards on simpler tasks, e.g. MultiGoal, and manages to solve more challenging tasks, e.g. Door-Key, where baselines fail.
- Keywords: environment decomposition, subgoal discovery, generative modeling, reinforcement learning, unsupervised learning