Boosting Weakly-Supervised Image Segmentation via Representation, Transform, and Compensator

Chunyan Wang, Dong Zhang, Rui Yan

Published: 2024, Last Modified: 05 Jan 2026IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Weakly-supervised image segmentation (WSIS) is a fundamental task in the domain of computer vision that relies on image-level class labels. While multi-stage training procedures have been widely used in existing WSIS methods to obtain high-quality pseudo-masks as ground-truth, resulting in significant progress, single-stage WSIS methods have recently gained attention due to their potential for simplifying the training procedure. However, single-stage methods suffer from low-quality pseudo-masks that limit their practical applications. To address this problem, this paper proposes a novel single-stage WSIS method that utilizes a siamese network with contrastive learning to improve the quality of class activation maps (CAMs) and achieve a self-refinement result. The proposed method employs a cross-representation refinement method that expands reliable object regions by utilizing different feature representations from the backbone. Besides, a cross-transform regularization module is introduced that learns robust class prototypes for contrastive learning and captures global context information to feed back rough CAMs, thereby improving the quality of CAMs. The final high-quality CAMs are used as pseudo-masks to supervise the segmentation result. Experimental results on the PASCAL VOC 2012 and COCO datasets demonstrate that the proposed method significantly outperforms other state-of-the-art methods, achieving 72.38% and 72.95% mIoU on PASCAL VOC 2012 val set and test set, 42.51% mIoU on COCO val set, respectively. Furthermore, the proposed method has been extended to weakly supervised object localization, and experimental results demonstrate that it continues to achieve very competitive results. The source codes have been released at https://github.com/ChunyanWang1/RTC.
Loading