Keywords: policy selection, domain adaptation, navigation under uncertainty
TL;DR: We present an approach that enables a robot to deploy multiple learning/adaptation strategies during deployment and pick the best one.
Abstract: We present an approach for performant point-goal navigation in unfamiliar partially-mapped environments. When deployed, our robot runs multiple strategies for deployment-time learning and visual domain adaptation in parallel and quickly selects the best-performing among them. Choosing between policies as they are learned or adapted between navigation trials requires continually updating estimates of their performance as they evolve. Leveraging recent work in model-based learning-informed planning under uncertainty, we determine lower bounds on the would-be performance of newly-updated policies on old trials without needing to re-deploy them. This information constrains and accelerates bandit-like policy selection, affording quick selection of the best-performing strategy shortly after it would start to yield good performance. We validate the effectiveness of our approach in simulated maze-like environments, showing improved navigation cost and cumulative regret versus existing baselines.
Supplementary Material: zip
Spotlight Video: mp4
Video: https://youtu.be/nDPjyIE7-5c
Publication Agreement: pdf
Student Paper: yes
Submission Number: 258
Loading