Recurrent Policies Are Not Enough for Continual Reinforcement Learning

Nathan Samuel de Lara; Veronica Chelu; Doina Precup

Recurrent Policies Are Not Enough for Continual Reinforcement Learning

Nathan Samuel de Lara, Veronica Chelu, Doina Precup

Published: 07 Jun 2024, Last Modified: 09 Aug 2024RLC 2024 ICBINB PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual Reinforcement Learning, Recurrent Policies, Catastrophic Forgetting, Non-stationary Environments

TL;DR: This paper examines the combination of recurrent networks with policy networks in Continual Reinforcement Learning, identifying issues like embedding collapse and catastrophic forgetting

Abstract: Continual Reinforcement Learning (CRL) aims to develop algorithms that adapt to non-stationary sequences of tasks. A promising recent approach utilizes Recurrent Neural Networks (RNNs) to learn contextual Markov Decision Process (MDP) embeddings. This enables a reinforcement learning (RL) agent to discern the optimality of actions across diverse tasks. In this study, we examine two critical failure modes in the learning of these contextual MDP embeddings. Specifically, we find that RNNs are prone to catastrophic forgetting, manifesting in two distinct ways: (i) embedding collapse---where agents initially learn a contextual task structure that later collapses to a single task, and (ii) embedding drift---where learning embeddings for new MDPs interferes with embeddings the RNN outputs for previous MDPs in the sequence, leading to suboptimal performance of downstream policy networks conditioned on stale embeddings. We explore the effects of various objective functions and network architectures concerning these failure modes, revealing that one of these modes consistently emerges across different setups.

Submission Number: 8

Loading