Keywords: multi-view, scene completion, casual captures, novel view synthesis, 3D Gaussian splatting, neural radiance fields (NeRF), uncalibrated reconstruction, generative 3D models
TL;DR: Fillerbuster is a unified inpainting framework for scene completion that jointly models images and camera poses to reconstruct and complete missing parts of casually captured scenes.
Abstract: We present Fillerbuster, a unified model that completes unknown regions of a 3D scene with a multi-view latent diffusion transformer. Casual captures are often sparse and miss surrounding content behind objects or above the scene. Existing methods are not suitable for this challenge as they focus on making known pixels look good with sparse-view priors, or on creating missing sides of objects from just one or two photos. In reality, we often have hundreds of input frames and want to complete areas that are missing and unobserved from the input frames. Our solution is to train a generative model that can consume a large context of input frames while generating unknown target views and recovering image poses when camera parameters are unknown. We show results where we complete partial captures on two existing datasets. We also present an uncalibrated scene completion task where our unified model predicts both poses and creates new content. We open-source our framework for integration into popular reconstruction platforms like Nerfstudio or Gsplat. We present a flexible, unified inpainting framework to predict many images and poses together, where all inputs are jointly inpainted, and it could be extended to predict more modalities such as depth.
Submission Number: 272
Loading