Cross-Layer Clustering for Stochastic Parameter Decomposition

Published: 02 Mar 2026, Last Modified: 02 Mar 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: mechanistic interpretability, distributed circuits, stochastic parameter decomposition, co-activation analysis, multi-layer dependencies, language models, LLM interpretability, network decomposition
TL;DR: Cross-layer spectral clustering method to uncover distributed neural mechanisms by linking co-activated subcomponents across layers in language models
Abstract: Mechanistic interpretability seeks to decompose neural networks into interpretable circuits. Stochastic parameter decomposition (Bushnaq et al., 2025, SPD) yields sparse, atomic subcomponents within layers but does not capture the multi-layer pathways driving complex behavior. We propose a cross-layer spectral clustering framework that automatically discovers these distributed mechanisms by analyzing co-activation patterns across inputs. By measuring the Pearson correlation of importance scores between subcomponents, we construct a similarity graph that links disjoint parts of the network contributing to the same computational task. On synthetic models with known circuits, our method successfully recovers the ground-truth mechanistic structure confirming its ability to identify cross-layer dependencies. When applied to small language models, we find multi-layer clusters whose top-activating examples suggest consistent linguistic functions (e.g., tracking salient entities and tense morphology). These clusters serve as high-quality hypotheses for follow-up causal tests, providing a scalable step toward discovering system-level mechanisms in language models.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Saman_Seshadri1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 39
Loading