Think Fast! Learning to Control Online Reasoning in Stochastic Environments
Keywords: Markov Decision Processes, Planning, Reinforcement Learning
TL;DR: We present a resource-rational metareasoning method that learns how best to interleave thinking and acting in stochastic domains.
Abstract: When an autonomous agent's decision-making has resource costs or incurs potential real-world consequences, its performance can be improved by reasoning about its own decision-making process. This is known as metareasoning, and is a key capability of rational agents. However, existing metareasoning methods have significant limitations. Most apply only to the offline setting, controlling only how long the agent should think before executing its current best solution. Few methods exist for online metareasoning, where the agent can interleave thinking and acting, and these make strong simplifying assumptions that limit their performance. It is rarer still for methods to be applicable to stochastic problems, or to consider the effects of the environment on the agent's planning process.
In this work we extend a learning-based metareasoning method for probabilistic planning to the online setting. The framework enables the agent to learn when, where and how to think in order to make better decisions in stochastic environments. We demonstrate our method outperforming several baselines across two domain distributions, each highlighting different benefits of online metareasoning.
Area: Search, Optimization, Planning, and Scheduling (SOPS)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 851
Loading