Efficient Object-Centric Representation Learning using Masked Generative Modeling

TMLR Paper4762 Authors

30 Apr 2025 (modified: 08 Sept 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Learning object-centric representations from visual inputs in an unsupervised manner has drawn focus to solve more complex tasks, such as reasoning and reinforcement learning. However, current state-of-the-art methods, relying on autoregressive transformers or diffusion models to generate scenes from object-centric representations, suffer from computational inefficiency due to their sequential or iterative nature. This computational bottleneck limits their practical application and hinders scaling to more complex downstream tasks. To overcome this, we propose MOGENT, an efficient object-centric learning framework based on masked generative modeling. MOGENT conditions a masked bidirectional transformer on learned object slots and employs a parallel iterative decoding scheme to generate scenes, enabling efficient compositional generation. Experiments show that MOGENT significantly improves computational efficiency, accelerating the generation process by up to 67x and 17x compared to autoregressive models and diffusion-based models, respectively. Importantly, the efficiency is attained while maintaining strong or competitive performance on object segmentation and compositional generation tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In response to the comment from the action editor, we have added the following modifications to the current manuscript. - We have added an experiment comparing MOGENT against SlotDiffusion with different solver hyperparameters to better highlight the efficiency-quality trade-off (Appendix B.7) - We have updated Section 2 and Appendix D to provide a more comprehensive overview of recent advancements in accelerating generative models - We have fixed the image aspect ratios for CLEVR and CLEVRTex datasets - We have proofread the entire manuscript, including both the main and appendix sections
Assigned Action Editor: ~Grigorios_Chrysos1
Submission Number: 4762
Loading