# 6. Conclusion

We introduced attention-head binding (EB\*) as a mechanistic interpretability metric for tracking concept emergence in language models. Applying this metric longitudinally across seven models spanning 117M–3B parameters, five architectures (GPT-2, GPT-NeoX, Dolma, LLaMA-3 GQA, Qwen2 GQA), and up to 5 random seeds (CRFM), we established four empirical findings across 41 web accessibility terms (N=205 prompts).

First, EB\* temporally precedes behavioral competence during training: at ≥1B parameters, 73–90% of terms show EB\* leading behavioral emergence by one checkpoint (C1-B; binomial p < 0.01). This lead-lag relationship is absent at 160M scale, where behavior leads instead—identifying a parameter threshold for the binding-precedence effect. Second, models with high binding but low behavioral performance contain latent knowledge that few-shot prompting can unlock (C3; up to +61 pp improvement). Third, binding and behavior decouple at late training checkpoints across multiple architectures: at 1B scale, ρ_late = −0.054; in SmolLM3-3B (3440k steps), ρ_late = −0.281—the deepest decoupling observed. A two-factor model emerges: parameter threshold (~1B) governs decoupling depth, training-step threshold (~300k) governs temporal ordering (C4). Fourth, targeted ablation across seven models reveals a **scale- and initialization-graded causal trajectory**: (1) *coupling* — binding heads causally necessary (Pythia-160M: −11.2 pp); (2) *load-bearing* — peak causal necessity at transitional scale (Pythia-1B: −15.1 pp); (3) *redundant/ceiling* — consolidated distributed representations with near-zero effect (OLMo, Qwen: ≈ −1 pp); (4) *initialization-sensitive boundary* — CRFM GPT-2 at 117M / 400k steps shows seed-dependent outcomes: seed 1 yields a suppressor pattern (+20.9 pp when ablated, spec=−0.175) while seed 2 yields strong coupling (−23.4 pp, spec=+0.203), demonstrating that small-model causal head function at training maturity is not deterministically fixed by scale and training duration (C5).

The initialization-sensitivity finding is our most unexpected result: opposite causal regimes — suppressor vs. load-bearing scaffold — emerge from identical architecture and training, differing only in random seed. This demonstrates that at 117M scale, the landscape of possible causal roles for binding heads is multi-stable, with initialization determining which attractor the model converges to. This also explains why single-seed conclusions about small model causal structure are unreliable.

The binding-behavior decoupling effect remains our central contribution: C4 identifies it observationally across seven models and training regimes; C5 validates it causally with discriminant controls. Together they demonstrate that the relationship between internal mechanistic structure and external behavioral capability is not fixed but undergoes systematic, scale-graded transformation — a finding with implications for how we interpret, monitor, and develop language models.

Attention binding (EB\*) offers a simple, computationally tractable diagnostic extractable from any transformer with attention access. We hope it proves useful both as an analytical tool for understanding concept emergence and as a practical monitoring signal in safety-critical domains — particularly web accessibility, where the integrity of internal representations directly affects AI systems serving users with disabilities.
