Subgoal-Guided Policy Heuristic Search with Learned Subgoals

Jake Tuero; Michael Buro; Levi Lelis

Subgoal-Guided Policy Heuristic Search with Learned Subgoals

Jake Tuero, Michael Buro, Levi Lelis

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Policy tree search is a family of tree search algorithms that use a policy to guide the search. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. While these algorithms have shown promising results, the process in which they are trained requires complete solution trajectories to train the policy. Search trajectories are obtained during a trial-and-error search process. When the training problem instances are hard, learning can be prohibitively costly, especially when starting from a randomly initialized policy. As a result, search samples are wasted in failed attempts to solve these hard instances. This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms. The subgoals and policies conditioned on subgoals are learned from the trees that the search expands while attempting to solve problems, including the search trees of failed attempts. We empirically show that our policy formulation and training method improve the sample efficiency of learning a policy and heuristic function in this online setting.

Lay Summary: This paper looks at a type of problem-solving method called tree search, where a system explores different options (like branches on a tree) to find a solution. These methods use a *policy* to guide the search, which provides a recommendation to the algorithm for which actions to examine first. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. The better the policy is, the faster the search algorithm can find the solution state. However, teaching the system to create an informative policy usually requires knowing the full solution to each training problem, which can be *expensive* in terms of time and how much *work* the search algorithm needs to do to find a solution, when starting from scratch. Often, the system wastes effort trying and failing, and those failures don't help to learn the policy. Our work proposes a new way to help the system learn more effectively by using *subgoals*, which can be thought of as a series of waypoints, that make the big problem easier to solve. Instead of the system having to learn how to solve the entire problem, it now only needs to learn how to solve easier subgoals, which can be strung together to solve the global problem. This enables our method to learn from failed attempts where we do not solve the problem outright, but solve some of the subgoals. The results show that this approach helps the system learn faster and more efficiently, without sacrificing the quality of the policy.

Primary Area: Reinforcement Learning->Planning

Keywords: Tree search, heuristic search, policy tree search, planning

Submission Number: 9329

Loading