Abstract: We present the first systematic analysis of attention heads for syntactic relations in decoder-only Transformer language models. Prior work has demonstrated that encoder-only and encoder-decoder architectures contain attention heads aligned with single-hop syntactic relations, but the internal mechanisms of decoder-only models remain underexplored. Focusing on two representative families (GPT-2 and XGLM) across five model sizes (117M, 345M, 774M, 1.5B, 1.7B parameters), we identify a novel class of attention heads that capture multi-hop dependency paths (MDPs), e.g., “obl+case”. Through controlled head-ablation on the BLiMP benchmark, we show that removing 25\% MDP heads induces 7.1\% drop in average grammaticality accuracy, compared to only 1.6\% drop when ablating the same number of conventional, single-hop syntactic heads. Crucially, this pattern holds consistently across all five model sizes, demonstrating the robustness of our findings. Technically, we (i) extend existing head-identification methods—previously limited to encoder-only and encoder-decoder models—to the decoder-only setting, and (ii) propose a formal definition and detection algorithm for MDP heads. Our results reveal that decoder-only Transformers internalize syntactic information in more complex, non-canonical forms than previously understood, underscoring the importance of cross-chain interactions for grammatical competence.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: probing, explanation faithfulness
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3265
Loading