Spatial Frequency-Aware Self-Distillation for Weakly-Supervised Semantic Segmentation

Jingyuan Fang, Yang Ning, Xiushan Nie

Published: 06 Mar 2025, Last Modified: 15 May 2025ICASSP 2025EveryoneRevisionsCC BY 4.0

Abstract: Weakly-supervised semantic segmentation (WSSS) aims to achieve pixel-level classification under image-level supervision. Recent class activation map (CAM)-based methods seek to expand foreground activation while suppressing background. However, they often overlook the uncertainty of CAM, where non-salient activation in some regions complicates semantic classification. These regions are typically dismissed as noise, resulting in inappropriate activations due to inadequate regularization. To resolve this, we introduce a Spatial Frequency-Aware SelfDistillation strategy (SFS). Firstly, to enhance the perception of high-frequency spatial information in uncertain regions, we propose a boundary self-distillation and uncertain region reconstruction strategy, which captures high-frequency boundary information and fine-grained spatial context in these regions. Secondly, to enhance the discrimination of low-frequency semantic features, we propose a contrastive attention mechanism that guides the Vision Transformer (ViT) to focus more on the foreground, thereby improving the distinction between foreground and background. Finally, our SFS demonstrates outstanding performance on both the VOC 2012 and COCO 2014 datasets, attributed to its superior spatial frequency perception capabilities. The code is available at https://github.com/fjoybest/SFS.