Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World

Akhil Bagaria; Anita De Mello Koch; Rafael Rodriguez-Sanchez; Sam Lobel; George Konidaris

Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World

Akhil Bagaria, Anita De Mello Koch, Rafael Rodriguez-Sanchez, Sam Lobel, George Konidaris

Published: 09 May 2025, Last Modified: 21 Jul 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Exploration, hierarchical RL, planning, option discovery

Abstract: We seek to design reinforcement learning agents that build plannable models of the world that are abstract in both state and time. We propose a new algorithm to construct a skill graph; nodes in the skill graph represent abstract states and edges represent skill policies. Previous works that learn a skill graph use random sampling from the state-space and nearest-neighbor search: operations that are infeasible in environments with high-dimensional observations (for example, images). Furthermore, previous algorithms attempt to increase the probability of all edges (by repeatedly executing the corresponding skills) so that the resulting graph is robust and reliable everywhere. However, exhaustive coverage is infeasible in large environments, and agents should prioritize practicing skills that are more likely to result in higher reward. We show that our agent can solve challenging image-based exploration problems more rapidly than vanilla model-free RL and state-of-the-art novelty-based exploration; then, we show that the resulting abstract model solve a family of tasks not provided during the agent's exploration phase.

Publication Agreement Form: pdf

Submission Number: 232

Loading