Learning to Prioritize Planning Updates in Model-based Reinforcement LearningDownload PDF

04 Oct 2022, 23:09 (modified: 26 Nov 2022, 09:49)NeurIPS 2022 Workshop MetaLearn PosterReaders: Everyone
Keywords: Reinforcement Learning, Meta-Learning, Planning
TL;DR: We use meta learning to determine from which states planning should begin in a stochastic, non-stationary RL task.
Abstract: Prioritizing the states and actions from which policy improvement is performed can improve the sample efficiency of model-based reinforcement learning systems. Although much is already known about prioritizing planning updates, more needs to be understood to operationalize these ideas in complex settings that involve non-stationary and stochastic transition dynamics, large numbers of states, and scalable function approximation architectures. Our paper presents an online meta-learning algorithm to address these needs. The algorithm finds distributions that encode priority in their probability mass. The paper evaluates the algorithm in a domain with a changing goal and with a fixed, generative transition model. Results show that prioritizing planning updates from samples of the meta-learned distribution significantly improves sample efficiency over fixed baseline distributions. Additionally, they point to a number of interesting opportunities for future research.
0 Replies