Keywords: self-supervised learning, navigation, vision
TL;DR: We proposed a self-supervised learning approach for learning visual features for robotics navigation.
Abstract: Self-supervised learning has revolutionized the fields of computer vision and natural language processing. Despite its potential, its application to robotic navigation tasks remains under-explored. This is due to the difficulty of defining effective self-supervision signals for robotics. Fortunately, with the recent development of many large-scale robotic navigation datasets with a variety of sensor and action data that can be used as self-supervision signals, self-supervised learning has become a viable approach for robotic navigation tasks. In this work, we propose a self-supervised method for learning visual features for end-to-end robotic navigation systems, using actions as the supervisory signal. This approach is motivated by the observation that humans tend to focus on specific regions of their frontal view in order to make navigation decisions and produce navigation actions. We reverse this procedure, using future actions to learn only the visual features that are important for navigation, as opposed to extracted features by conventional computer vision models that tend to extract every detail of the environment that can be misleading to a downstream navigation controller. Our results show that this approach enables small convolutional neural network-based visual encoders to achieve performance comparable to large vision foundation models trained on billions of images. This demonstrates the scalability and effectiveness of our self-supervised learning method for robotic navigation.
Submission Number: 8
Loading