Decomposition of Small Transformer Models

Casper L. Christensen; Logan Riggs Smith

Decomposition of Small Transformer Models

Casper L. Christensen, Logan Riggs Smith

Published: 30 Sept 2025, Last Modified: 09 Nov 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Applications of interpretability, Interpretability tooling and software

Other Keywords: parameter,decomposition

TL;DR: We extend recent Parameter Decomposition work to make it better on Transformers and shows its applicability with a new toy model and GPT-2.

Abstract: Recent work in mechanistic interpretability has proposed decomposing model parameters rather than activations. We extend Stochastic Parameter Decomposition (SPD) to Transformer models, proposing an updated causal importance function suited for sequential data. We demonstrate that SPD can successfully decompose a toy induction-head model and recover the underlying computations. We also show that applying SPD to GPT-2-small can successfully locate subcomponents corresponding to interpretable concepts like "golf" and "basketball". This work takes the first step in the direction of extending SPD to modern models, and shows that we can use the method to surface interpretable parameter-space mechanisms.

Submission Number: 205

Loading