Keywords: Reinforcement Learning, Lifelong Learning
TL;DR: We analyze theoretically how the optimal value function changes across tasks and derive a method for non-negative transfer of value functions in Lifelong Reinforcement Learning.
Abstract: We consider the problem of reusing prior experience when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and focus on the study and exploitation of the optimal value function's Lipschitz continuity in the task space with respect to that metric. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm that exploits continuity to accelerate learning. We illustrate the benefits of the method in Lifelong RL experiments.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2001.05411/code)
Original Pdf: pdf
9 Replies
Loading