Keywords: generative model, neural rendering, causality, unsupervised learning, computer vision
TL;DR: We propose the first unsupervised framework for inference of explicit object-centric 3D scene representations, that generalizes to out-of-distribution scenes.
Abstract: We present a novel causal generative model for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images. This model is trained to reconstruct multi-view images via a latent representation describing the shapes, colours and positions of the 3D objects they show. We then propose an inference algorithm that can infer this latent representation given a single out-of-distribution image as input. We conduct extensive experiments applying our approach to test datasets that have zero probability under the training distribution. Our approach significantly out-performs baselines that do not capture the true causal image generation process.