Keywords: Few-shot Learning, Lifelong Meta RL, Multi-Task RL, PAC-Bayes Bound, Generalization Error Bound
Abstract: We propose a new empirical PAC-Bayes approach to develop lifelong reinforcement learning algorithms with theoretical guarantees. The main idea is to extend the PAC-Bayes theory in supervised learning to the reinforcement learning regime. More specifically, we train a distribution of policies, and gradually improve the distribution parameters via optimizing the generalization error bound using trajectories from each task. As the agent sees more tasks, it elicits better prior distributions of policies, resulting in tighter generalization bounds and improved future learning. To demonstrate the superior performance of our method compared to recent state-of-the-art methods, we test the proposed algorithms on various OpenAI's Gym and Mujuco environments and show that they adapt to new tasks more efficiently by continuously distilling knowledge from past tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)
Supplementary Material: zip