Abstract: Monocular SLAM has received a lot of attention due to its simple RGB inputs and the lifting of complex sensor constraints. However, existing monocular SLAM systems lack accurate depth estimation, which limits the accuracy of tracking and mapping performance. To address this limitation, we propose MoD-SLAM, the first monocular NeRF-based dense mapping method that allows 3D reconstruction in real-time in unbounded scenes. Specifically, we introduce a depth estimation module in the front-end to extract accurate priori depth values to supervise mapping and tracking processes. This strategy is essential to improve the SLAM performance. Moreover, a Gaussian-based unbounded scene representation approach is designed to solve the challenge of mapping scenes without boundaries. By introducing a robust depth loss term into the tracking process, our SLAM system achieves more precise pose estimation in large-scale scenes. Our experiments on two standard datasets show that MoD-SLAM achieves competitive performance, improving the accuracy of the 3D reconstruction and localization by up to 30% and 15% respectively compared with existing monocular SLAM systems.
Loading