LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video

Noriaki Hirose; Catherine Glossop; Ajay Sridhar; Oier Mees; Sergey Levine

LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video

Noriaki Hirose, Catherine Glossop, Ajay Sridhar, Oier Mees, Sergey Levine

Published: 05 Sept 2024, Last Modified: 08 Nov 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language-conditioned navigation policy, data augmentation

TL;DR: Data augmentation for in-the-wild video for learning language-conditioned navigation policy.

Abstract: We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.

Supplementary Material: zip

Video: https://www.youtube.com/watch?v=-zTyhhu0NTY

Website: https://learning-language-navigation.github.io/

Code: https://github.com/NHirose/learning-language-navigation

Publication Agreement: pdf

Student Paper: no

Spotlight Video: mp4

Submission Number: 711

Loading