SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: neural maps, visual positioning, semantic mapping
TL;DR: Learn semantic neural maps from self-supervision with state-of-the-art visual positioning
Abstract: Semantic 2D maps are commonly used by humans and machines for navigation purposes, whether it's walking or driving. However, these maps have limitations: they lack detail, often contain inaccuracies, and are difficult to create and maintain, especially in an automated fashion. Can we use _raw imagery_ to automatically create _better maps_ that can be easily interpreted by both humans and machines? We introduce SNAP, a deep network that learns rich 2D _neural_ maps from ground-level and overhead images. We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images. SNAP can resolve the location of challenging image queries beyond the reach of traditional methods, outperforming the state of the art in localization by a large margin. Moreover, our neural maps encode not only geometry and appearance but also high-level semantics, discovered without explicit supervision. This enables effective pre-training for data-efficient semantic scene understanding, with the potential to unlock cost-efficient creation of more detailed maps.
Submission Number: 4649
Loading