Keywords: Dual-Structure, Self-Distilled Learning, Unsupervised Semantic Segmentation
TL;DR: a novel framework that performs self-distillation within a single network by transferring the stronger semantic representations learned in deeper layers to guide shallower layers, without relying on external teachers.
Abstract: Unsupervised semantic segmentation (USS) aims to assign semantic labels to pixels without human annotations, yet existing methods struggle to capture semantic structures across different abstraction levels. We propose Dual-Structure Self-Distilled Learning (DSSDL), a novel framework that performs self-distillation within a single network by transferring the stronger semantic representations learned in label space to guide shallower layer, without relying on external teachers. DSSDL integrates two complementary structures:(1) an affinity structure that performs binary pair classification over pairwise similarity scores and leverages a reversed directional mining strategy to preserve fine-grained local consistency.(2) a cluster structure that derives semantic codes from global prototypes and aligns per-pixel predictions via a swapped prediction loss to encourage consistent global grouping. By jointly modeling both structures, DSSDL enforces semantic consistency at both local and global levels, resulting in coherent and robust segmentations. Our method achieves substantial improvements over the strong baseline STEGO, with accuracy and mIoU gains of +16.7 and +3.3 on COCO-Stuff, +14.8 and +3.2 on Cityscapes, and +8.2 and +11.5 on Potsdam-3, respectively.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 6630
Loading