Learning World Graph Decompositions To Accelerate Reinforcement Learning

Wenling Shang; Alex Trott; Stephan Zheng; Caiming Xiong; Richard Socher

Learning World Graph Decompositions To Accelerate Reinforcement Learning

Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We learn a task-agnostic world graph abstraction of the environment and show how using it for structured exploration can significantly accelerate downstream task-specific RL.

Abstract: Efficiently learning to solve tasks in complex environments is a key challenge for reinforcement learning (RL) agents. We propose to decompose a complex environment using a task-agnostic world graphs, an abstraction that accelerates learning by enabling agents to focus exploration on a subspace of the environment.The nodes of a world graph are important waypoint states and edges represent feasible traversals between them. Our framework has two learning phases: 1) identifying world graph nodes and edges by training a binary recurrent variational auto-encoder (VAE) on trajectory data and 2) a hierarchical RL framework that leverages structural and connectivity knowledge from the learned world graph to bias exploration towards task-relevant waypoints and regions. We show that our approach significantly accelerates RL on a suite of challenging 2D grid world tasks: compared to baselines, world graph integration doubles achieved rewards on simpler tasks, e.g. MultiGoal, and manages to solve more challenging tasks, e.g. Door-Key, where baselines fail.

Keywords: environment decomposition, subgoal discovery, generative modeling, reinforcement learning, unsupervised learning

Original Pdf: pdf

12 Replies

Loading