Keywords: Language-conditioned navigation policy, data augmentation
TL;DR: Data augmentation for in-the-wild video for learning language-conditioned navigation policy.
Abstract: We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.
Supplementary Material: zip
Video: https://www.youtube.com/watch?v=-zTyhhu0NTY
Website: https://learning-language-navigation.github.io/
Code: https://github.com/NHirose/learning-language-navigation
Publication Agreement: pdf
Student Paper: no
Spotlight Video: mp4
Submission Number: 711
Loading