Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming ChallengesDownload PDF

Published: 03 Mar 2023, Last Modified: 20 Apr 2023RRL 2023 PosterReaders: Everyone
Keywords: Reinforcement Learning, Continual Learning, Task-agnostic, Recurrent Models
TL;DR: We find that recurrent model-free RL combined with experience replay applied to task-agnostic continual RL surprisingly and consistently outperforms task-aware baselines, as well as matches its multi-task (soft-)upper bound
Abstract: We study methods for task-agnostic continual reinforcement learning (TACRL). TACRL combines the difficulties of partially observable RL (due to task agnosticism) and the challenges of continual learning (CL), which involves learning on a non-stationary sequence of tasks. As such, TACRL is important in real-world applications where agents must continuously adapt to changing environments. Our focus is on a previously unexplored and straightforward baseline for TACRL called replay-based recurrent RL (3RL). This approach augments an RL algorithm with recurrent mechanisms to mitigate partial observability and experience replay mechanisms to prevent catastrophic forgetting in CL. We pose a counterintuitive hypothesis that 3RL could outperform its soft upper bounds prescribed by previous literature: multi-task learning (MTL) methods that do not have to deal with non-stationary data distributions, as well as task-aware methods that can operate under full observability. Specifically, we believe that the challenges that arise in certain training regimes could be best overcome by 3RL enabled by its ability to perform \emph{fast adaptation}, compared to task-aware approaches, which focus on task memorization. We extensively test our hypothesis by performing a large number of experiments on synthetic data as well as continuous-action multi-task and continual learning benchmarks where our results provide strong evidence that validates our hypothesis.
Track: Technical Paper
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
2 Replies