Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting

TMLR Paper4667 Authors

14 Apr 2025 (modified: 12 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sparse-view scene reconstruction often faces significant challenges due to the constraints imposed by limited observational data. These limitations result in incomplete information, leading to suboptimal reconstructions using existing methodologies. To address this, we present Intern-GS, a novel approach that effectively leverages rich prior knowledge from vision foundation models to enhance the process of sparse-view Gaussian Splatting, thereby enabling high-quality scene reconstruction. Specifically, Intern-GS utilizes vision foundation models to guide both the initialization and the optimization process of 3D Gaussian splatting, effectively addressing the limitations of sparse inputs. In the initialization process, our method employs DUSt3R first to generate a dense gaussian point cloud. This approach significantly alleviates the limitations encountered by traditional structure-from-motion (SfM) methods, which often struggle under sparse-view constraints. However, directly using DUST3R tends to introduce unnecessary redundancy. To mitigate this, we propose a redundancy-free strategy that leverages confidence scores to remove overlapping regions across frames. During the optimization process, we propose a hybrid regularization strategy that jointly constrains both observed and unobserved views in terms of color and geometry, guiding 3DGS optimization toward more accurate reconstructions. Extensive experiments demonstrate that Intern-GS achieves state-of-the-art rendering quality across diverse datasets, including both forward-facing and large-scale scenes, such as LLFF, DTU, and Tanks and Temples.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Derek_Hoiem1
Submission Number: 4667
Loading