- Keywords: scene reconstruction, differentiable rendering
- Abstract: We introduce a novel scene reconstruction method to infer a fully editable and re-renderable model of a 3D road scene from a single image. We represent movable objects separately from the immovable background, and recover a full 3D model of each distinct object as well as their spatial relations in the scene. Based on transformer-based detectors and neural implicit 3D representations, we build a Scene Decomposition Network (SDN) that reconstructs the scene, and the reconstruction can further be used in analysis-by-synthesis via differentiable rendering. Trained only on simulated road scenes, our method generalizes well to real data in the same class without any adaptation thanks to its strong inductive priors. Experiments on two synthetic-real dataset pairs (PD-DDAD and VKITTI-KITTI) show that our method can robustly recover scene geometry and appearance, as well as reconstruct and re-render the scene from novel viewpoints.
- Supplementary Material: zip