Keywords: Impossible grammars, hierarchical grammars, interpretability, attribution patching
TL;DR: Language models have disjoint mechanisms for processing hierarchical and linear grammars.
Abstract: All natural languages contain hierarchical structure. In humans, this structural restriction is neurologically coded: when presented with linear and hierarchical grammars with identical vocabularies, brain areas responsible for language processing are only sensitive to the hierarchical grammar. In this study, we investigate whether such functionally specialized grammar processing regions can emerge in large language models (LLMs) whose processing mechanisms are formed solely from exposure to language corpora. We prompt transformer-based autoregressive LLMs to determine the grammaticality of hierarchical and linear grammars in an in-context-learning setup. First, we discover that models demonstrate higher accuracy, and lower/comparable surprisals, on hierarchical grammars. Next, we use attribution patching to show that model components processing hierarchical and linear grammars are distinct. Lastly, ablating components for hierarchical/linear grammars selectively reduces accuracy for the corresponding grammar. Our findings indicate that large-scale text exposure alone can lead to functional specialization in LLMs.
Submission Number: 43
Loading