Lifelong Learning of Skills with Fast and Slow Reinforcement Learning
- Keywords: Lifelong Learning, Reinforcement Learning
- Abstract: Humans and animals excel in lifelong learning settings in which an agent has to leverage past knowledge to solve sequences of tasks assigned to the agent one after the other. However, reinforcement learning algorithms perform poorly in lifelong settings when little task data is available at the beginning of each new task. To address this, we introduce a hybrid reinforcement learning method that merges model-based planning with explicit policy optimization to benefit from the former's zero-shot generalization and the high asymptotic performance of the latter. To ensure that the agent performs competently throughout its lifetime, we propose a simple offline evaluation scheme that decides which policy to use by evaluating the performance of each method at the beginning of every episode. Furthermore, to maximize its learning capabilities, we integrate tools from offline reinforcement learning to guarantee continuous improvement even under the distribution shift caused by collecting data using the two different policies. Using this approach, we find that the agent can both yield acceptable performance even on tasks encountered for the first time and, in comparison to other lifelong RL algorithms, improve task performance in the long run as the agent acquires more task experience on three continuous control tasks including reaching, locomotion, and manipulation domains.
- One-sentence Summary: We introduce a hybrid reinforcement learning that enables an agent to perform competently throughout its lifetime in lifelong learning settings.