Keywords: Language models, mechanistic interpretability, causal intervention, long-distance dependency
Abstract: Prior work has demonstrated that language models encode syntactic structure, but the operations by which they form syntactic dependencies remain poorly understood.
This paper investigates the internal procedures underlying long-distance dependency formation using activation patching.
Analyzing four dependency types across model sizes, we find that small models rely on broadly similar attention-based heuristics, whereas larger models exhibit differentiated operational pipelines: non-displacement dependencies involve attention-based marking of structurally illicit positions, while displacement dependencies do not.
These patterns are robust to dependency length.
Our results suggest that increasing model size leads to a human-like distinction between displacement and non-displacement dependencies, implemented via different internal operations.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: linguistic theories, computational psycholinguistics
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 10636
Loading