Keywords: feature learning, representation learning, interpretability
Abstract: We propose a self-supervised learning framework that organizes hidden feature representations across layers, thereby enhancing interpretability. The framework first discovers unit-level structures by comparing activation patterns across data samples. Building on these structures, we introduce a structure-aware regularization objective that (i) promotes feature reuse across layers via identity mappings and (ii) encourages the emergence of representative units that serve as anchors for related features. This regularization yields clearer and more structured feature pathways, enhancing the interpretability of the learned representations. Experiments demonstrate that our method induces structured feature pathways on synthetic data, improves interpretability on CIFAR-10 as measured by Grad-CAM++ metrics, and maintains competitive performance with slightly improved mean accuracy on both CIFAR-10 and ImageNet-1K.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 4212
Loading