Keywords: Continual Reinforcement Learning, Reinforcement Learning, Lifelong reinforcement learning, value functions, permanent and transient
TL;DR: We propose to use different sets of feature representations for estimating permanent and transient value functions
Abstract: Continual Reinforcement Learning agents struggle to adapt to new situations while retaining past knowledge, resulting in a stability–plasticity trade-off. An appealing solution is to decompose the agent’s predictions into permanent and transient components---one for long-term retention and the other for rapid adaptation---thereby achieving a better balance~\citep{anand2023prediction}. Building on this idea, we propose using different sets of feature representations to estimate permanent and transient value functions, enabling even faster adaptation. We demonstrate the effectiveness of our approach on small-scale examples for both prediction and control tasks, analyze its theoretical properties, and show its benefits on the Craftax-Classic benchmark using a novel non-parametric approximator for transient value function estimation. Our method facilitates online learning and outperforms the PQN baseline.
Primary Area: reinforcement learning
Submission Number: 22814
Loading