Prediction and Control in Continual Reinforcement Learning

Nishanth Anand; Doina Precup

Prediction and Control in Continual Reinforcement Learning

Nishanth Anand, Doina Precup

Published: 21 Sept 2023, Last Modified: 20 Dec 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: reinforcement learning, continual reinforcement learning, lifelong learning, never-ending learning, prediction, control, multi-task learning, complementary learning systems

TL;DR: We propose to decompose the value function into two components and learning them at different timescales in continual reinforcement learning.

Abstract: Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a _permanent_ value function, which holds general knowledge that persists over time, and a _transient_ value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.

Supplementary Material: pdf

Submission Number: 14728

Loading