Abstract: A crucial ability of human intelligence is to build up models of individual 3D objects from
partial scene observations. Recent works either achieve object-centric generation but without
the ability to infer the representation, or achieve 3D scene representation learning but without
object-centric compositionality. Therefore, learning to both represent and render 3D scenes
with object-centric compositionality remains elusive. In this paper, we propose a probabilistic
generative model for learning to build modular and compositional 3D object models from
partial observations of a multi-object scene. The proposed model can (i) infer the 3D object
representations by learning to search and group object areas, and also (ii) render from an
arbitrary viewpoint not only individual objects but also the full scene by compositing the
objects. The entire learning process is unsupervised and end-to-end. In experiments, in
addition to generation quality, we also demonstrate that the learned representation permits
object-wise manipulation and novel scene generation, and generalizes to various settings.
Results can be found on our project website: https://sites.google.com/view/roots3d.
0 Replies
Loading