Factorized Neural Radiance Field with Depth Covariance Function for Dense RGB Mapping

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Simultaneous Localization and Mapping (SLAM), Neural Radiance Field (NeRF)
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: real-time neural implicit mapping and depth estimation
Abstract: Reconstructing high-quality and real-time dense maps is critical for building the 3D environment for robot sensing and navigation. Recently, Neural Radiance Field (NeRF) has garnered great attention due to its excellent scene representa- tion capacity of the 3D world; therefore, recent works leverage NeRF to learn 3D maps, typically based on RGB-D cameras. However, depth sensors are not always available for all devices, while RGB cameras are cheap and widely appli- cable. Therefore, we propose to use single RGB input for the scene reconstruction with NeRF, which becomes highly challenging without geometric guidance from depth sensors. Moreover, we cultivate its real-time capability with lightweight implementation. In this paper, we propose FMapping, a factorized NeRF map- ping framework, allowing for high-quality and real-time reconstruction with only the RGB input. The insight of our method is that depth doesn’t experience much change in consecutive RGB frames, thus the geometrical clues can be derived from RGB effectively with well estimated depth priors. In detail, we divide the map- ping into 1) the initialization stage and 2) the on-the-fly stage. First, given trackers are not always stable in the initialization stage, we start with a noisy pose input to optimize the mapping. To this end, we exploit geometric consistency between volume rendering and signed distance function in a self-supervised way to cap- ture depth accurately. In the second stage, given relatively short optimization time for real-time performance, we model the depth estimation as a Gaussian process (GP) with a pre-trained cost-effective depth covariance function to promptly infer depth on the condition of previous frames. Meanwhile, the per-pixel depth esti- mation and its corresponding uncertainty can guide the NeRF sampling process. Hence, we propose to densely allocate sample points within adjustable truncation regions near the surface, and further distribute samples to ones with high uncer- tainty. This way, we can continue building maps from subsequent poses with sta- bilized trackers. Experiments demonstrate that our framework outperforms state- of-the-art RGB-based mapping and achieves comparable performance to RGB-D mapping in terms of photometric and geometric accuracy, with real-time depth estimation capability in around 5 Hz with consistent scale.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5930
Loading