- TL;DR: We propose a generative latent variable model for unsupervised scene decomposition that provides factorized object representation per foreground object while also decomposing background segments of complex morphology.
- Abstract: We propose a generative latent variable model for unsupervised scene decomposition. Our model, SPACE, provides a unified probabilistic modeling framework to combine the best of previous models. SPACE can explicitly provide factorized object representation per foreground object while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. With the proposed parallel-spatial attention, SPACE also resolves the scalability problem of previous methods and thus makes the model applicable to scenes with a much larger number of objects without performance degradation. Besides, the foreground/background distinction of SPACE is more effective and intuitive than other methods because unlike other methods SPACE can detect static objects that look like background. In experiments on Atari and 3D-Rooms, we show that SPACE achieves the above properties consistently in all experiments in comparison to SPAIR, IODINE, and GENESIS.
- Keywords: Unsupervised scene decomposition, Object-oriented representation, segmentation, spatial attention