ARC-RL: Self-Evolution Continual Reinforcement Learning via Action Representation Space

Chaofan Pan; Jiafen Liu; Yanhua Li; Linbo Xiong; Fan Min; Wei Wei; Tianrui Li; Xin Yang

ARC-RL: Self-Evolution Continual Reinforcement Learning via Action Representation Space

Chaofan Pan, Jiafen Liu, Yanhua Li, Linbo Xiong, Fan Min, Wei Wei, Tianrui Li, Xin Yang

13 Sept 2024 (modified: 17 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: continual learning, lifelong learning, reinforcement learning

Abstract: Continual Reinforcement Learning (CRL) is a powerful tool that enables agents to learn a sequence of tasks, accumulating knowledge learned in the past and using it for problemsolving or future task learning. However, existing CRL methods all assume that the agent’s capabilities remain static within dynamic environments, which doesn’t reflect realworld scenarios where capabilities evolve. This paper introduces *Self-Evolution Continual Reinforcement Learning* (SE-CRL), a new and realistic problem where the agent’s action space continually changes. It presents a significant challenge for RL agents: How can policy generalization across different action spaces be achieved? Inspired by the cortical functions that lead to consistent human behavior, we propose an **A**ction **R**epresentation **C**ontinual **R**einforcement **L**earning framework (ARC-RL) to address this challenge. Our framework builds a representation space for actions by self-supervised learning on transitions, decoupling the agent’s policy from the specific action space. For a new action space, the decoder of the action representation is expanded or masked for adaptation and regularized fine-tuned to improve the stability of the policy. Furthermore, we release a benchmark based on MiniGrid to validate the effectiveness of methods for SE-CRL. Experimental results demonstrate that our framework significantly outperforms popular CRL methods by generalizing the policy across different action spaces.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 77

Loading