Value Function Learning via Prolonged Backward Heuristic Search

Published: 27 Apr 2023, Last Modified: 09 Jul 2023PRLEveryoneRevisionsBibTeX
Keywords: Search, Learning heuristics, Exploration, Prolonged Heuristic Search
TL;DR: Efficient exploration method for learning the value function to be used as a heuristics in the search.
Abstract: In practical applications like autonomous robots, we often need to solve similar problems repeatedly (e.g.\ replanning). Existing methods, that improve search performance based on learning from experience in similar previously solved problems, train the heuristics by imitating oracle data. However, such methods rather focus on generating the data with appropriate distribution (e.g.\ by aggregating data online) rather than the computational complexity of generating it. Computational complexity becomes especially limiting for high-dimensional problems. Here, we present a search-inspired method for systematic model exploration that allows us to efficiently generate data and use all explored states for learning the value function -- that can then be employed as heuristic. Our method helps with data distribution as the search typically explores many more states besides the optimal path. The coverage can be improved even further with the Prolonged Search algorithm, which does not stop when a goal is reached, but rather keeps the search running until an extended region around the optimal path is explored. This, in turn, improves both the efficiency and robustness of successive planning. To address the negative effects of using an ML heuristic, we bound it with other heuristics to prevent (significant) overestimating the cost-to-go and ensure bounds on optimality even for non-iid or out-of-domain data. Our approach outperforms existing methods on benchmark problems and shows promising directions for developing efficient and robust search-based planning systems.
Submission Number: 18
Loading