---
title: "Attention-Head Binding as a Mechanistic Marker of Accessibility Concept Emergence in Language Models"
author: "Anonymous Authors"
date: "2026"
---

# Abstract

We introduce *attention-head binding* (EB\*), a mechanistic interpretability metric that tracks how attention heads bind multi-token accessibility terms (e.g., "screen reader," "alt text") during training. Across **seven models** spanning five architectures (GPT-2, GPT-NeoX, Dolma, LLaMA-3 GQA, Qwen2 GQA; 117M–3B parameters) and up to **five random seeds** (CRFM GPT-2), we validate EB\* on a **41-term canonical accessibility register** (N=205 prompts). We establish a **two-factor representational lifecycle**: (1) a parameter threshold (~1B) governs decoupling depth, and (2) a training-step threshold (~300k) governs temporal ordering. Binding temporally precedes behavioral competence at ≥1B scale (**73–90% of terms show EB\*-leads-behavior**, C1-B binomial p < 0.01 across Pythia, OLMo, CRFM), while smaller models show anti-precedence due to insufficient training. Few-shot prompting unlocks latent knowledge with **+18–37 percentage point gains** (C3), with modern models (SmolLM3, Qwen) showing headroom compression—identical absolute ceilings (~0.72) but lower nominal Δ due to high zero-shot baselines. Targeted ablation on the canonical dataset reveals a **scale-graded causal trajectory**: coupled at 160M (−11.2 pp, spec=+0.137), maximally load-bearing at 1B (−15.1 pp, spec=+0.117), redundant/ceiling at 2.8B and OLMo/Qwen (spec≈0), and **initialization-sensitive at small scale**—CRFM GPT-2 (117M) shows 4/5 seeds coupled but 1/5 suppressor (spec=−0.175), demonstrating that causal head function is not deterministically fixed by architecture alone. These findings establish attention binding as a cross-architecture diagnostic for concept emergence and reveal that mechanistic-behavioral relationships undergo systematic, scale-graded transformation.

---

**Structure:**

- §1 Introduction → `sections/introduction.md`
- §2 Related Work → `sections/related_work.md`
- §3 Methods → `sections/methods.md`
- §4 Results → `sections/results.md`
- §5 Discussion → `sections/discussion.md`
- §6 Conclusion → `sections/conclusion.md`
- Appendix → `appendix/`
- References → `references.md`
