Completing Explicit 3D Reconstruction via View Extrapolation with Diffusion Priors

Published: 18 Apr 2025, Last Modified: 15 May 2025ICRA 2025 FMNS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sparse view 3d reconstruction, view extrapolation, image diffusion model, novel view synthesis
TL;DR: We introduce a pipeline that pairs a 2D diffusion prior directly with a 3D foundation model to generate and align novel views from sparse inputs, achieving realistic 3D scene completion and superior geometric consistency.
Abstract: Completing 3D scenes from limited observations requires both optimization and generation, but existing methods often overfit to input views, making it difficult to produce realistic images for extrapolated viewpoints. To address this issue, we propose a pipeline that utilizes 2D diffusion prior explicitly with 3D foundation model for view extrapolation and scene completion. The key idea of this approach is harnessing the diffusion model’s prior more directly than refining or inpainting the defective rendering results. A robust 3D reconstruction model, MASt3R, provides depth and normal maps from images, making it possible to create reliable warped images from reference views. Our diffusion model leverages warped images as conditioning inputs for view extrapolation to ensure the generated images accurately align with the query poses. Furthermore, our method ensures geometric consistency across all views by adopting a divide-and-conquer strategy during the alignment process, incorporating newly generated information into the 3D scene and use it to create updated warped images. We validate our approach on multiple categories from the CO3D dataset, demonstrating superior extrapolation performance, realistic appearance, and enhanced 3D consistency compared to both 3D Gaussian Splatting-based and other diffusion-based baselines.
Supplementary Material: pdf
Submission Number: 19
Loading