Abstract: Knowledge distillation (KD) has become a cornerstone for compressing deep neural networks, allowing a smaller student model to learn from a larger teacher model. In the context of semantic segmentation, traditional KD methods primarily focus on pixel-level feature alignment, where the student model is trained to match the teacher’s features at each pixel. Despite performance improvements, the pixel-level alignment can introduce noise and redundant information, particularly in complex scenes, and often overlook the global structural context that is crucial for robust segmentation. To overcome these limitations, we propose Global Structural Knowledge Distillation (GSKD), a novel approach that moves beyond dense pixel-level alignment. Instead of aligning features pixel-by-pixel, we focus on capturing and transferring global structural information within an image. Our method begins with Class-Balanced Sampling (CBS), which ensures that representative features from various classes are sampled evenly from the teacher’s feature maps. This helps the model better represent both common and rare classes, addressing class imbalance. Next, we construct a Global Structural Similarity Map (GSSM) for both the teacher and student models. This map encodes the key structural patterns of the image by calculating pairwise similarities between the sampled points, providing the structural information of the scene. To enhance the knowledge transfer process, we generate Sub-Image Descriptors (SID) through row-wise shuffling and column-wise grouping of the GSSM. These descriptors allow the student model to capture high-level semantic relationships and structural patterns, overcoming the limitations of traditional pixel-level feature alignment. The proposed method is designed to be flexible; It can be used both as a standalone method and as a plug-and-play module for integration with existing KD techniques. Our extensive experiments demonstrate that GSKD consistently outperforms or matches recent KD methods in standalone settings and significantly enhances the performance of state-of-the-art KD methods when incorporated as a plug-in-play module.
External IDs:doi:10.1109/access.2025.3575066
Loading