Discriminatively Matched Part Tokens for Pointly Supervised Instance Segmentation

Zonghao Guo; Mingxiang Liao; Zhiliang Peng; Yidan Zhang; Peng Yuan; Qixiang Ye; Fang Wan

Discriminatively Matched Part Tokens for Pointly Supervised Instance Segmentation

Zonghao Guo, Mingxiang Liao, Zhiliang Peng, Yidan Zhang, Peng Yuan, Qixiang Ye, Fang Wan

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Instance Segmentation; Pointly supervision; Visual Transformer; Part-based Model;Segment Anything Model

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: The self-attention mechanism of vision transformer has demonstrated potential for instance segmentation even using a single point as supervision. However, when it comes to objects with significant deformation and variations in appearance, this attention mechanism encounters a challenge of semantic variation among object parts. In this study, we propose discriminatively matched part tokens (DMPT), to extend the capacity of self-attention for pointly supervised instance segmentation. DMPT first allocates a token for each object part by finding a semantic extreme point, and then introduces part classifiers with deformable constraint to re-estimate part tokens which are utilized to guide and enhance the fine-grained localization capability of the self-attention mechanism. Through iterative optimization, DMPT matches the most discriminative part tokens which facilitate capturing fine-grained semantics and activating full object extent. Extensive experiments on PASCAL VOC and MS-COCO segmentation datasets show that DMPT respectively improves the state-of-the-art method by 2.0% mAP50 and 1.6% AP, achieving the best performance under point supervision. DMPT is combination with the Segment Anything Model (SAM), demonstrating the great potential to reform point prompt learning. Code is enclosed in the supplementary material.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1093

Loading