Recent advancements in learnable spatial representation structures, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting, have improved 3D scene reconstruction from 2D images, enhancing computational efficiency, scalability, and memory usage. However, in multi-view environments, reconstruction performance degrades in regions with limited field of view (FOV) overlap, especially in Forward-Facing Scene datasets. To address this, we assume extended camera poses and use Depth Image-Based Rendering (DIBR) for data augmentation, generating new views beyond the original FOV. Additionally, we employ diffusion models to generate new viewpoints in data-scarce areas and fine-tune them with Low-Rank Adaptation (LoRA) to maintain spatial consistency with existing views. Our approach significantly improves reconstruction quality in outer regions by combining extended camera poses, DIBR, and diffusion models. It works effectively in both single-image and multi-view setups, enhancing 3D reconstruction from sparse camera coverage and limited training data.
Abstract:
Loading