Do Neural Networks Learn Similar Subspaces? An Empirical Exploration of Joint Parametric Subspaces in Deep Neural Networks
Keywords: universal subspace, universality, mechanistic interpretability
TL;DR: Neural Networks learn finite. low rank, universal weight spaces!
Abstract: Deep neural networks trained on diverse tasks with a shared architecture often exhibit overlapping representational structures, suggesting the presence of underlying commonalities in their learned parameters. In this work, we hypothesize that each layer of such networks contains a universal, low-dimensional weight subspace that is systematically utilized across tasks. While prior studies have alluded to related phenomena, we provide the first systematic empirical evidence supporting this hypothesis. By applying spectral decomposition techniques to the weight matrices of various architectures trained on a wide range of tasks and datasets, we identify sparse, universal subspaces that are consistently exploited, regardless of task or domain. Our findings offer new insights into the intrinsic organization of information within deep networks and raise important questions about the possibility of discovering these universal subspaces without the need for extensive data and computational resources. Furthermore, this inherent structure has significant implications for model reusability, multi-task learning, model merging, and the development of training and inference-efficient algorithms, potentially reducing the carbon footprint of large-scale neural models.
Primary Area: interpretability and explainable AI
Submission Number: 12617
Loading