When Do Transformer Components Compose? Validating a Log-Pool Decomposition Criterion
Keywords: transformer decomposition, log-pool aggregation, mechanistic interpretability, metric validation, trustworthy machine learning, large language models, component attribution, parameter-scrambled controls, causal abstraction, philosophy of machine learning
TL;DR: We test a log-pool criterion for grouping transformer components by whether their induced token distributions compose with the full model. Across Llama, Pythia, and Gemma, it generalizes and separates trained models from scrambled-weight controls.
Abstract: A recent log-pool framework introduces a compositional utility gap as a criterion for decomposing a probability distribution into interacting contributions. We ask whether this gap, applied to architectural decompositions of trained transformers, has empirical content independent of any subagent interpretation. We measure the gap at three resolutions: atomic per-component gaps, random partitions, and partitions found by a genetic algorithm. A partition is feasible when the smallest group gap is non-negative. Held-out recomputation is stable; cross-corpus transfer depends on resolution and architecture. Across Llama 3.1 8B, Pythia 1.4B, and Gemma 2 2B, random-partition feasibility differs sharply under three nulls (uniform, size-matched, residual-matched): Llama is permissive, Pythia rare, Gemma zero in every cell; the genetic algorithm finds a feasible partition in all but one trained cell. Under per-tensor-$\sigma$-matched parameter-scrambled controls, both random and search feasibility collapse in every cell. Under these stated conventions, the gap is a substantive measurement tool for learned transformer structure, while leaving open whether the measured property reflects subagent structure, distributional coherence, or broader decomposition coherence.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 79
Loading