Seeing the part and knowing the whole: Object-Centric Learning with Inter-Feature Prediction

23 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Object-Centric Learning, Self-Supervised Learning, Computer Vision
Abstract: Humans can naturally decompose scenes into understandable objects, resulting in strong visual comprehension ability. In light of this, Object-Centric Learning (OCL) seeks to explore how to construct object-level representations by encoding the information of objects in the scenes into several object vectors referred to as `slots'. Current OCL models rely on an auto-encoding paradigm that encodes the image feature into slots and reconstructs the images by composing the slots. However, merely reconstruction objectives do not guarantee that each slot exactly corresponds to a holistic object. Existing methods often fail when objects have complex appearances because the reconstruction objective cannot indicate which pixels should be assigned to the same slot. Therefore, additional regularization based on a more general prior is required. For this purpose, we draw on the gestalt ability that humans tend to complete a broken figure and perceive it as a whole, and propose Predictive Prior that features belonging to the same object tend to be able to predict each other. We implement this prior as an external loss function, demanding the model to assign features that can predict each other to the same slot, and vice versa. With experiments on multiple datasets, we demonstrate that our model outperforms previous models by a large margin in complex environments where objects have irregular outlines and intense color changes, according to various tasks including object discovery, compositional generation, and visual question \& answering. Visualization results verify that our model succeeds in discovering objects holistically rather than dividing them into multiple parts, proving that Predictive Prior gives a more general object definition. Code is available at https://anonymous.4open.science/r/PredictivePrior-32EF.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2939
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview