Learning to Prioritize Planning Updates in Model-based Reinforcement Learning

Bradley Burega; John D Martin; Michael Bowling

Learning to Prioritize Planning Updates in Model-based Reinforcement Learning

Bradley Burega, John D Martin, Michael Bowling

Published: 21 Oct 2022, Last Modified: 05 May 2023NeurIPS 2022 Workshop MetaLearn PosterReaders: Everyone

Keywords: Reinforcement Learning, Meta-Learning, Planning

TL;DR: We use meta learning to determine from which states planning should begin in a stochastic, non-stationary RL task.

Abstract: Prioritizing the states and actions from which policy improvement is performed can improve the sample efficiency of model-based reinforcement learning systems. Although much is already known about prioritizing planning updates, more needs to be understood to operationalize these ideas in complex settings that involve non-stationary and stochastic transition dynamics, large numbers of states, and scalable function approximation architectures. Our paper presents an online meta-learning algorithm to address these needs. The algorithm finds distributions that encode priority in their probability mass. The paper evaluates the algorithm in a domain with a changing goal and with a fixed, generative transition model. Results show that prioritizing planning updates from samples of the meta-learned distribution significantly improves sample efficiency over fixed baseline distributions. Additionally, they point to a number of interesting opportunities for future research.

0 Replies

Loading