Keywords: 3D Gaussian splatting, novel view synthesis, camera pose estimation
Abstract: Recent advancements in neural rendering for 3D reconstruction have focused on constructing representations directly from uncalibrated RGB images, bypassing the need for Structure-from-Motion (SfM) preprocessing. A primary challenge in this domain is the joint optimization of scene geometry and camera parameters, a task fraught with inherent ambiguities. Although 3D Gaussian Splatting (3DGS) has achieved photorealistic reconstruction quality, its discrete, point-based representation complicates this joint optimization process. To address these challenges, we propose a robust, SfM-free framework that leverages pre-trained 3D feed-forward models within a coarse-to-fine alignment pipeline. Our method introduces Pi3 for scene initialization and proceeds with the joint training of geometry and camera poses. To enhance the stability of camera pose optimization, we employ 3D and 2D filters to regularize the gradients from signal alignment. Furthermore, we incorporate a geometric regularization based on image matching to provide global constraints for camera pose refinement, which significantly improves both reconstruction quality and pose estimation accuracy. Our method achieves competitive performance in novel view synthesis and camera pose estimation, demonstrating its robustness across diverse datasets.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9094
Loading