Gluing Local Contexts into Global Meaning: A Sheaf-Theoretic Decomposition of Transformer Representations

Published: 01 Mar 2026, Last Modified: 01 Mar 2026UCRL@ICLR2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sheaf cohomology, transformer interpretability, representation decomposition, activation steering, mechanistic interpretability, spectral methods, paraphrase invariance
TL;DR: We decompose transformer representations into content-stable (H⁰) and context-dependent (H¹) subspaces via sheaf Laplacian spectral analysis
Abstract: We introduce a sheaf-theoretic decomposition that separates transformer representations into content-stable ($H^0$) and context-dependent ($H^1$) subspaces. Standard representations conflate what a sentence means with how it is phrased, yet no existing framework isolates these components. Our method constructs a cellular sheaf over paraphrase graphs and decomposes the resulting Laplacian spectrally: near-zero eigenspaces encode phrasing-invariant content while maximal eigenspaces encode context-dependent variation. The decomposition proves functionally meaningful across five models spanning 124M to 13B parameters. $H^1$ dimensions exert 3.5 to 26.5 times greater causal influence on output distributions than variance-matched controls (Cohen's $d$ from 2.3 to 14.3). The $H^0$ subspace retrieves facts at 60 to 68 percent accuracy using only 20 dimensions, while $H^1$ ablation collapses generation and $H^0$ ablation leaves output unchanged. Beyond characterization, the decomposition serves as an architecture diagnostic. Llama-2-7B collapses under random perturbation with 4.2 percent fact preservation yet tolerates $H^1$-directed steering at 31 percent ($p < 10^{-50}$, $n$=1000). Robust models tolerate both perturbation types. This diagnostic capability, which identifies architectures requiring structured intervention, emerges from an unsupervised geometric criterion without concept labels.
Submission Number: 43
Loading