Uncertainty-Based Experience Replay for Task-Agnostic Continual Reinforcement Learning

TMLR Paper3378 Authors

23 Sept 2024 (modified: 04 Oct 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Model-based reinforcement learning uses a learned dynamics model to imagine actions and select those with the best expected outcomes. An experience replay buffer collects the outcomes of all actions executed in the environment, which is then used to iteratively train the dynamics model. However, as the complexity and scale of tasks increase, training times and memory requirements can grow drastically without necessarily retaining useful experiences. Continual learning proposes a more realistic scenario where tasks are learned in sequence, and the replay buffer can help mitigate catastrophic forgetting. However, it is not realistic to expect the buffer to infinitely grow as the sequence advances. Furthermore, storing every single experience executed in the environment does not necessarily provide a more accurate model. We argue that the replay buffer needs to have the minimal necessary size to retain relevant experiences that cover both common and rare states. Therefore, we propose using an uncertainty-based replay buffer filtering to enable an effective implementation of continual learning agents using model-based reinforcement learning. We show that the combination of the proposed strategies leads to reduced training times, smaller replay buffer size, and less catastrophic forgetting, all while maintaining performance.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 3378
Loading