Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

Jiahua Lan; Sen Zhang; Ruijun Liu; Haixia Pan; Dacheng Tao

Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

Jiahua Lan, Sen Zhang, Ruijun Liu, Haixia Pan, Dacheng Tao

25 Sept 2024 (modified: 25 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, stability-plasticity dilemma, skill neuron

TL;DR: We propose Neuron-level Balance between Stability and Plasticity (NBSP), a novel DRL framework that operates at the level of individual neurons.

Abstract: In contrast to the inherent ability of humans to continuously acquire new knowledge, modern deep reinforcement learning (DRL) agents generally encounter a significant challenge: the stability-plasticity dilemma, which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). In this study, we propose Neuron-level Balance between Stability and Plasticity (NBSP) to tackle this challenge, by taking inspiration from the observation that both stability and plasticity are integrally linked to the expressive capabilities of networks, which are primarily determined by the behavior of individual neurons. To the best of our knowledge, this is the first work that addresses both stability and plasticity loss simultaneously in DRL at the level of neurons. Specifically, NBSP first (1) defines and identifies RL skill neurons that are crucial for knowledge retention through a goal-oriented method, and then (2) introduces a stability-plasticity balancing mechanism by employing gradient masking and experience replay techniques targeting these neurons to preserve the encoded memory related to existing skills while enhancing the learning capabilities of other neurons. Experimental results on the Meta-World and Atari benchmarks demonstrate that NBSP significantly outperforms existing approaches in balancing stability and plasticity. Furthermore, our findings underscore the pivotal role of the critic within this context, providing valuable insights for future research.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4055

Loading