Data-Efficient Policy Selection for Navigation in Partial Maps via Subgoal-Based Abstraction

Abhishek Paudel; Gregory J. Stein

Data-Efficient Policy Selection for Navigation in Partial Maps via Subgoal-Based Abstraction

Abhishek Paudel, Gregory J. Stein

Published: 01 Jan 2023, Last Modified: 24 Sept 2024IROS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We present a novel approach for fast and reliable policy selection for navigation in partial maps. Leveraging the recent learning-augmented model-based Learning over Subgoals Planning (LSP) abstraction to plan, our robot reuses data collected during navigation to evaluate how well other alternative policies could have performed via a procedure we call offline all-policy replay. Costs from offline alt-policy replay constrain policy selection among the LSP-based policies during deployment, allowing for improvements in convergence speed, cumulative regret and average navigation cost. With only lim-ited prior knowledge about the nature of unseen environments, we achieve at least 67% and as much as 96% improvements on cumulative regret over the baseline bandit approach in our experiments in simulated maze and office-like environments.

Loading