Keywords: continual learning, deep reinforcement learning
TL;DR: We introduce a method that iteratively learns a subspace of policies in a continual reinforcement learning setting where tasks are presented sequentially.
Abstract: The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. However, existing methods are typically based on either fixed-size models that cannot capture many diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this paper, we introduce Continual Subspace of Policies (CSP), a method that iteratively learns a subspace of policies in the continual reinforcement learning setting where tasks are presented sequentially. The subspace's high expressivity allows our method to strike a good balance between stability (i.e. not forgetting prior tasks) and plasticity (i.e. learning new tasks), while the number of parameters grows sublinearly with the number of tasks. In addition, CSP displays good transfer, being able to quickly adapt to new tasks including combinations of previously seen ones without additional training. Finally, CSP outperforms state-of-the-art methods on a wide range of scenarios in two different domains. An interactive visualization of the subspace can be found at https://continual-subspace-policies-streamlit-app-gofujp.streamlitapp.com/.