Structuring Hidden Features via Clustering of Unit-Level Activation Patterns

Structuring Hidden Features via Clustering of Unit-Level Activation Patterns

ICLR 2026 Conference Submission4212 Authors

12 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: feature learning, representation learning, interpretability

Abstract: We propose a self-supervised learning framework that organizes hidden feature representations across layers, thereby enhancing interpretability. The framework first discovers unit-level structures by comparing activation patterns across data samples. Building on these structures, we introduce a structure-aware regularization objective that (i) promotes feature reuse across layers via identity mappings and (ii) encourages the emergence of representative units that serve as anchors for related features. This regularization yields clearer and more structured feature pathways, enhancing the interpretability of the learned representations. Experiments demonstrate that our method induces structured feature pathways on synthetic data, improves interpretability on CIFAR-10 as measured by Grad-CAM++ metrics, and maintains competitive performance with slightly improved mean accuracy on both CIFAR-10 and ImageNet-1K.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 4212

Loading