TL;DR: Reducing churn prevents the rank decrease of NTK, thus mitigates plasticity loss and improves continual RL.
Abstract: Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this paper, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability induced by the data in each training batch. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.
Lay Summary: Natural intelligent creatures have the ability to learn within their lifetimes, like human beings are able to keep accepting new information and learning new tasks every day. However, this kind of learnability or plasticity is non-trivial for artificial intelligence (AI). Most existing AI achievements are built for doing specific tasks to a near-human or super-human level. The continual learning ability of AI agents is still an open challenge.
In this paper, we study the plasticity issue of AI methods for learning a temporal sequence of tasks. We present that one cause of the plasticity issue is the inner unregularized generalization or interference behaviors in AI models. Based on formal analysis and empirical investigation, we propose a regularization method called C-CHAIN to suppress the inner behaviors, which is demonstrated to successfully mitigate the plasticity issue and improve the learnability of AI models in continual learning scenarios.
Our work will help to better understand the learning behaviors and issues of AI models. Our method and the idea behind it can be an easy-to-implement choice for continual learning problems. Our findings also indicate the distinction between natural intelligence and AI, from which more inspiration could be drawn to realize AI models of a higher level of intelligence.
Link To Code: https://github.com/bluecontra/C-CHAIN
Primary Area: Reinforcement Learning->Deep RL
Keywords: Plasticity, Continual Learning, Reinforcement Learning, Generalization
Submission Number: 1552
Loading