Knowledge Retention in Continual Model-Based Reinforcement Learning

Haotian Fu; Yixiang Sun; Michael Littman; George Konidaris

Knowledge Retention in Continual Model-Based Reinforcement Learning

Haotian Fu, Yixiang Sun, Michael Littman, George Konidaris

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: *Synthetic Experience Rehearsal*, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and *Regaining Memories Through Exploration*, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.

Lay Summary: Robots and other AI agents often need to master a series of tasks, one after another, without being allowed to keep every previous experience—think of a helper robot that moves from one apartment to the next, or a phone‑based assistant that must forget sensitive user data. Today’s learning systems quickly “forget” what they once knew and have to start almost from scratch each time. Our paper introduces DRAGO, a two‑step “dream and explore” method that helps an agent hold on to what it has learned while still making room for new skills. First, the agent dreams: it trains a small simulator that can invent realistic memories of earlier tasks and rehearse them internally, so no raw data need be stored. Then it explores: it rewards itself for revisiting parts of the world it used to understand well, stitching old and new knowledge together. In tests on grid‑world games and simulated robots, DRAGO kept its skills far better than existing methods, letting agents adapt to new goals faster and with less data—an important step toward lifelong, privacy‑aware AI.

Primary Area: Reinforcement Learning->Deep RL

Keywords: Deep Reinforcement learning, Model-based Reinforcement Learning, Continual Learning, World Models

Submission Number: 7975

Loading