- Abstract: In this paper, we propose a generative model, called ROOTS (Representation of Object-Oriented Three-dimension Scenes), for unsupervised object-wise 3D-scene decomposition and and rendering. For 3D scene modeling, ROOTS bases on the Generative Query Networks (GQN) framework, but unlike GQN, provides object-oriented representation decomposition. The inferred object-representation of ROOTS is 3D in the sense that it is viewpoint invariant as the full scene representation of GQN is so. ROOTS also provides hierarchical object-oriented representation: at 3D global-scene level and at 2D local-image level. We achieve this without performance degradation. In experiments on datasets of 3D rooms with multiple objects, we demonstrate the above properties by focusing on its abilities for disentanglement, compositionality, and generalization in comparison to GQN.
- Keywords: unsupervised learning, representation learning, 3D scene decomposition, 3D detection