Decoupled representation and policy acquisition for continual reinforcement learning

Alexander Gepperth; Yannick Denker; Alexander Krawczyk; Benedikt Bagus

Decoupled representation and policy acquisition for continual reinforcement learning

Alexander Gepperth, Yannick Denker, Alexander Krawczyk, Benedikt Bagus

16 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: continual learning, reinforcement learning, q-learning, replay

TL;DR: Reinforcement learning with non-stationary environments using a replay method from the field of CL that has constant scaling behavior over time and thus enables really long-term learning.

Abstract: This contribution proposes adiabatic reinforcement learning (ARL), a new method for continual reinforcement learning (CRL). In CRL, we assume a non-stationary environment partitioned into \textit{tasks}. To avoid catastrophic forgetting (CF), RL requires the use of large replay buffers, which leads to very slow learning and high memory requirements. To remedy this, we propose adiabatic reinforcement learning (ARL), a wake-sleep method that performs slow learning of internal representations from high-error transitions during sleep phases. Wake phases are used for the fast learning of policies, i.e., mappings from representations to actions, and to collect new high-error transitions. Representation learning is performed by \textit{adiabatic replay} (AR), a recent CL technique we adapted to the RL setting. AR uses selective, internal replay of samples that are likely to be affected by forgetting. Since this process is conditioned on incoming samples only, its has constant time-complexity w.r.t. tasks. Other benefits include fast adaptation to new tasks, and a very low memory footprint due to the complete absence of replay buffers.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1053

Loading