Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: scene generation, depth inpainting
Abstract: 3D scene generation has quickly become a challenging new research direction, fueled by consistent improvements of 2D generative diffusion models. Current methods generate scenes by iteratively stitching newly generated images with existing geometry, using pre-trained monocular depth estimators to lift the generated images to 3D. The predicted depth is fused with the existing scene representation through various alignment operations. In this work, we make two fundamental contributions to the field of 3D scene generation. First, we note that lifting images to 3D with a monocular depth estimation model is suboptimal as it ignores the geometry of the existing scene, thus prompting the need for alignment. We introduce a depth completion model to directly learn the 3D fusion process, resulting in improved geometric coherence of generated scenes. Second, we introduce a new benchmark to evaluate the geometric accuracy of scene generation methods. We show that the commonly used CLIP score between scene prompts and images is unsuitable for measuring the geometric quality of a scene and introduce a depth-based metric. Our benchmark thus offers an additional dimension to gauge the quality of generated scenes.
Supplementary Material: zip
Submission Number: 387
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview