SlotSAM: Bootstrap Segmentation Foundation Model under Real-world Shifts via Object-Centric Learning

Luyao Tang; Yuxuan Yuan; Kunze Huang; Xinghao Ding; Chaoqi Chen; Yue Huang

SlotSAM: Bootstrap Segmentation Foundation Model under Real-world Shifts via Object-Centric Learning

Luyao Tang, Yuxuan Yuan, Kunze Huang, Xinghao Ding, Chaoqi Chen, Yue Huang

24 Sept 2024 (modified: 20 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: segmentation foundation model, distribution shift, object-centric learning, weakly supervised

Abstract: Foundation models have made incredible strides in achieving zero-shot or few-shot generalization, leveraging prompt engineering to mimic the problem-solving approach of human intelligence. However, when it comes to some foundation models like Segment Anything, there is still a challenge in performing well under real-world shifts. One of the real-world shifts is the distribution shift, the out-of-distribution data, such as camouflaged and medical images. Another is inconsistent prompting strategies during fine-tuning and testing, leading to decreased performance. We draw inspiration from human intelligence, particularly the process by which individuals decompose scenes into components in unfamiliar environments to determine the positions or boundaries of each component. To this end, we introduce SlotSAM, a method that reconstructs features from the encoder in a self-supervised manner to create object-centric representations. These representations are then integrated into the foundation model, bolstering its object-level perceptual capabilities while reducing the impact of distribution-related variables. The beauty of SlotSAM lies in its simplicity and adaptability to various tasks, making it a versatile solution that significantly enhances the generalization abilities of foundation models. Through limited parameter fine-tuning in a bootstrap manner, our approach paves the way for improved generalization in novel environments.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3504

Loading