Keywords: continual reinforcement learning
Abstract: One roadblock to building general AI agents is the inability to continually learn and adapt to environmental changes without dramatically forgetting previous knowledge. This deficiency is highly linked to most reinforcement learning~(RL) methods being designed based on the critical assumption of a fixed environment transition dynamics and reward function. To address these limitations, in this paper, we first dive deeper into the less studied foundations of continual RL, focusing on defining the MDP distance and catastrophic forgetting based on the difference of optimal value functions. In particular, we analyze the learning behaviors of continual RL algorithms with the sole stability or plasticity ability. A theoretically principled continual RL algorithm is further proposed by reweighting the historical and current Bellman targets, explicitly balancing the stability and plasticity in continual RL. We conduct rigorous experiments in the tabular setting to corroborate our analytical results, suggesting the potential of our proposed algorithm in real continual RL scenarios.
Submission Number: 123
Loading