Abstract: We present a novel approach for fast and reliable policy selection for navigation in partial maps. Leveraging the recent learning-augmented model-based Learning over Subgoals Planning (LSP) abstraction to plan, our robot reuses data collected during navigation to evaluate how well other alternative policies could have performed via a procedure we call offline all-policy replay. Costs from offline alt-policy replay constrain policy selection among the LSP-based policies during deployment, allowing for improvements in convergence speed, cumulative regret and average navigation cost. With only lim-ited prior knowledge about the nature of unseen environments, we achieve at least 67% and as much as 96% improvements on cumulative regret over the baseline bandit approach in our experiments in simulated maze and office-like environments.
Loading