Catastrophic Negative Transfer: An Overlooked Problem in Continual Reinforcement Learning

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Continual reinforcement learning, negative transfer, SAC, Behavioral Cloning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We considered the catastrophic negative transfer problem
Abstract: Continual Reinforcement Learning (CRL) recently witnessed significant advancements, but negative transfer, a phenomenon in which policy training for new task fails when trained after a specific previous task, has been largely overlooked. In this paper, we shed light on the prevalence and catastrophic nature of the negative transfer in CRL through systematic experiments on the Meta-World RL environments. Our findings highlight that this phenomenon possesses a unique characteristic distinct from the mere reduction in plasticity or capacity observed in conventional RL algorithms. Then, we introduce a simple yet effective baseline called Reset \& Distill (R\&D) to address the issue of negative transfer in CRL. R\&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. As a result, our method can successfully mitigate both catastrophic negative transfer and forgetting in CRL. We carried out extensive experiments on long sequence of Meta-World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R\&D to mitigate its detrimental effects.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2524
Loading