Time is of the Essence: Why Decision-Time Planning Costs Matter

Published: 04 Jun 2024, Last Modified: 19 Jul 2024Finding the Frame: RLC 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: search, planning, decision-time planning
TL;DR: What is the best search algorithm? Not one that takes a long time. We make a search algorithm that choose when to use more compute.
Abstract: The goal of this work is to build agents that use resources efficiently during decision-time planning. Such agents should learn to dynamically spend more compute at difficult, critical decision points, and less otherwise. To this end, we propose timed MDPs and timed policies, which augment MDPs and policies to explicitly factor in the cost of time and compute usage. By extending MDPs in this way, agents can learn to trade off the cost of planning against potentially higher rewards. To make our point concretely, we modify an existing algorithm, Thinker, to use a variable amount of compute to make each decision. We then train it to maximize a reward that includes a penalty for using more compute that depends on the context. Our modified algorithm, Dynamic Thinker, learns to use compute more efficiently than Thinker and AlphaZero. More specifically, it reaches similar performance levels using fewer planning steps in experiments on a simple knapsack problem.
Submission Number: 35
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview