Keywords: non-object-centric data, self-supervised learning, representation learning, visual pre-training, object discovery
Abstract: Scaling up data and computing has become the norm for pre-training powerful visual encoders. Current algorithms, when scaled up, often require training on large-scale datasets that are unlikely to be object-centric. However, these algorithms were typically developed and validated on the object-centric ImageNet. This discrepancy may suggest sub-optimal scalability and underutilized data potential. Non-object-centric (NOC) data, with its multiple objects and complex layouts, tends to be more information-dense. To better leverage this underlying structure, we introduce a semantic bottleneck to MIM, which reduces the number of prototypes to encourage the emergence of objectness at patch-level token representation. Further, cross-view consistency regularization is applied to encourage multiview invariance. Together, this induces semantic object discovery and allows instance discrimination to be applied between object-level features (slots). Our experiments encompass pre-training on object-centric, scene-centric, web-crawled, and ego-centric data. Across all settings, our approach learns transferrable representations and achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations. When scaled up with million-scale datasets, our method also demonstrates superior data efficiency and scalability. We will make our code and model artifacts publicly available.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3989
Loading