Online Agglomerative Pooling for Scalable Self-Supervised Universal Segmentation

27 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: self-supervised learning, universal image segmentation, zero-shot segmentation, graph pooling
TL;DR: Based on an efficient pseudo-masks generation algorithm, we propose an online pretraining framework with Query-wise Self-distillation that achieves state-of-the-art performance for self-supervised universal image segmentation.
Abstract: Recent self-supervised image segmentors have achieved promising zero-shot performance. However, their pretraining schedule is multi-stage and alternates between offline pseudo-masks generation and parameters update, which leads to unstable training and sub-optimal solution. To solve this issue, we present Online Agglomerative Pooling (OAP) that allows efficiently generating universal pseudo-masks and updating parameters simultaneously at each training step. Specifically, OAP contains a stack of instance pooling and semantic pooling layers. By using a layer-varied threshold, OAP can generate multi-hierarchy masks that can provide more visual details for segmentation. Compared with MaskCut or Divide-Conquer, each OAP layer can identify connected nodes in parallel, thus can generate universal pseudo-masks for a single image within tens of milliseconds. Moreover, to deploy OAP in online pretraining, we devise a teacher-student framework with Query-wise Self-distillation, where the local view queries are each aligned with the matched global view queries to learn the local-to-global correspondence. Compared with other multi-stage offline pretraining methods, our framework can effectively scale to larger datasets while ensuring quicker convergence. Extensive experiments on the COCO, PASCAL VOC, Cityscapes, and UVO datasets show that our method achieves state-of-the-art performance on zero-shot instance segmentation, semantic segmentation, and panoptic segmentation. Our code and pretrained models shall be released upon acceptance of this work.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11556
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview