INFERNO: Inferring Object-Centric 3D Scene Representations without SupervisionDownload PDF

Published: 25 Mar 2022, Last Modified: 05 May 2023ICLR2022 OSC PosterReaders: Everyone
Keywords: object-centric, representation learning, nerf, differentiable renderer, autoencoder
TL;DR: Learn to decompose scenes into NeRF objects, disentangling their identity and pose.
Abstract: We propose INFERNO, a method to infer object-centric representations of visual scenes without annotations. Our method decomposes a scene into multiple objects, with each object having a structured representation that disentangles its shape, appearance and pose. Each object representation defines a localized neural radiance field used to generate 2D views of the scene through differentiable rendering. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision. We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output. Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset.
3 Replies