The Teacher-Student Interactive Cycle: Joint Optimization with Inner-Loop Self-Distillation in Prompted Foundation Models for Efficient Semantic Segmentation
Abstract: In the field of semantic segmentation, the high computational cost of deep models poses a major barrier to deployment on edge devices. Among various efficiency-oriented methods, knowledge distillation has emerged as a promising technique for transferring knowledge from large models to lightweight networks. However, current knowledge distillation methods for efficient semantic segmentation still face two key challenges: (1) they often rely on large offline pre-trained teacher networks that remain fixed during training, and (2) they lack joint optimization mechanisms that enable effective teacher-student interaction in pixel-wise dense prediction. As a result, mutual learning strategies originally designed for image-level classification often fail to capture the fine-grained consistency required for semantic segmentation. To address these two challenges, we propose a novel training framework termed Teacher-Student Interactive Cycle (TSIC), which performs efficient semantic segmentation. Specifically, TSIC integrates a lightweight student network into a prompt-based foundation model as a prompted segmentor to assist an online-trained teacher. The student provides coarse mask prompts to guide the teacher, while the teacher offers fine-grained supervision through posterior probabilities and intermediate feature maps. This loop enables joint online optimization without relying on offline pre-trained teachers and fosters effective bidirectional communication. Extensive experiments conducted on several benchmark datasets, including Cityscapes, Pascal VOC, CamVid, and ADE20k, demonstrate the effectiveness of TSIC. Compared to previous methods, TSIC achieves superior segmentation mIoU in most scenarios. Our code will be made publicly available at https://github.com/CV-ShuchangLyu/TSIC.
External IDs:doi:10.1109/tcsvt.2026.3665905
Loading