Visual Pre-training for Navigation: What Can We Learn from Noise?

Yanwei Wang; Ching-Yun Ko; Pulkit Agrawal

Visual Pre-training for Navigation: What Can We Learn from Noise?

Yanwei Wang, Ching-Yun Ko, Pulkit Agrawal

03 Oct 2022 (modified: 06 Jul 2025)Neurips 2022 SyntheticData4MLReaders: Everyone

Keywords: Synthetic Noise Data, Self-Supervised Learning, Embodied AI, Visual Navigation

TL;DR: Learning visual navigation auxillary tasks on synthetic noise transfers well to photo-realistic home images

Abstract: In visual navigation, one powerful paradigm is to predict actions from observations directly. Training such an end-to-end system allows representations that are useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data-hungry. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. Code is available at https://github.com/yanweiw/noise2ptz

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/visual-pre-training-for-navigation-what-can/code)

4 Replies

Loading