SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding

Xudong Lv; Zhiwei He; Yuxiang Yang; Jiahao Nie; Jing Zhang

SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding

Xudong Lv, Zhiwei He, Yuxiang Yang, Jiahao Nie, Jing Zhang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Neural implicit representations have recently revolutionized simultaneous localization and mapping (SLAM), giving rise to a groundbreaking paradigm known as NeRF-based SLAM. However, existing methods often fall short in accurately estimating poses and reconstructing scenes. This limitation largely stems from their reliance on volume rendering techniques, which oversimplify the modeling process. In this paper, we introduce a novel neural implicit SLAM system designed to address these shortcomings. Our approach reconstructs Neural Radiance Fields (NeRFs) using a self-attentive architecture and represents scenes through neural point cloud encoding. Unlike previous NeRF-based SLAM methods, which depend on traditional volume rendering equations for scene representation and view synthesis, our method employs a self-attentive rendering framework with the Transformer architecture during mapping and tracking stages. To enable incremental mapping, we anchor scene features within a neural point cloud, striking a balance between estimation accuracy and computational cost. Experimental results across three challenging datasets demonstrate the superior performance and robustness of our proposed approach compared to recent NeRF-based SLAM systems. The code will be released.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Content] Vision and Language

Relevance To Conference: We introduce a novel neural implicit SLAM system [1, 2] that reconstructs Neural Radiance Fields (NeRFs) [3] using a self-attentive architecture and represents scenes through neural point cloud encoding. We highlight the emergence of NeRF-based SLAM [4], which utilizes neural implicit representations for 3D scene reconstruction [5]. This technique has significant implications for multimedia applications as it offers more accurate and detailed reconstructions of environments, which can enhance the immersive experiences in AR, VR, and other multimedia content. [1]. Xuan Shao, et al. 2020. A Tightly-coupled Semantic SLAM System with Visual, Inertial and Surround-view Sensors for Autonomous Indoor Parking. (ACM MM '20). [2]. Wanting Li, et al. 2023. ColSLAM: A Versatile Collaborative SLAM System for Mobile Phones Using Point-Line Features and Map Caching. (ACM MM '23). [3]. Junyi Zeng, et al. 2023. Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing. (ACM MM '23). [4]. Zhihao Li, et al. 2023. PI-NeRF: A Partial-Invertible Neural Radiance Fields for Pose Estimation. (ACM MM '23). [5]. Chen Wang, et al. 2023. Digging into Depth Priors for Outdoor Neural Radiance Fields. (ACM MM '23)

Supplementary Material: zip

Submission Number: 1536

Loading