The Surprising Effectiveness of Latent World Models for Continual Reinforcement Learning

Samuel Kessler; Piotr Miłoś; Jack Parker-Holder; Stephen J. Roberts

The Surprising Effectiveness of Latent World Models for Continual Reinforcement Learning

Samuel Kessler, Piotr Miłoś, Jack Parker-Holder, Stephen J. Roberts

08 Oct 2022 (modified: 22 Jun 2025)Deep RL Workshop 2022Readers: Everyone

Keywords: Continual reinforcement learning, lifelong learning, continual learning, model-based reinforcement learning

TL;DR: We propose using World Models for Continual Reinforcement Learning and we show that it is a surprising good baseline for multiple reasons.

Abstract: We study the use of model-based reinforcement learning methods, in particular, world models for continual reinforcement learning. In continual reinforcement learning, an agent is required to solve one task and then another sequentially while retaining performance and preventing \emph{forgetting} on past tasks. World models offer a \emph{task-agnostic} solution: they do not require knowledge of task changes. World models are a straight-forward baseline for continual reinforcement learning for three main reasons. Firstly, forgetting in the world model is prevented by persisting existing experience replay buffers across tasks, experience from previous tasks is replayed for learning the world model. Secondly, they are sample efficient. Thirdly and finally, they offer a task-agnostic exploration strategy through the uncertainty in the trajectories generated by the world model. We show that world models are a simple and effective continual reinforcement learning baseline. We study their effectiveness on Minigrid and Minihack continual reinforcement learning benchmarks and show that it outperforms state-of-the-art task-agnostic continual reinforcement learning methods.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/the-surprising-effectiveness-of-latent-world/code)

0 Replies

Loading