Transformers Learn Different Parsing Strategies across Languages

ACL ARR 2026 January Submission2872 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Syntax, Constituency Parsing, Parsing Strategy
Abstract: Transformer-based language models (LMs) are trained purely for next-word prediction, yet they exhibit sensitivity to syntax. However, little is known about how they internally parse syntactic structures. Recent work has probed autoregressive LMs via an arc-standard shift-reduce dependency parser, revealing incremental syntactic states in LM representations, but their methodology is limited to a single dependency parsing strategy and fails to give insight into which parsing strategy is most compatible with autoregressive LM representations among many possible parsing strategies. In this paper, we extend the incremental probing methodology to constituency structures and investigate which parsing strategy best explains the internal parsing process of autoregressive LMs among top-down, bottom-up, and left-corner strategies. Our empirical results suggest that LMs implicitly learn different parsing strategies for different languages, with top-down being most prevalent in English and left-corner in Japanese.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: probing
Contribution Types: Model analysis & interpretability
Languages Studied: English, Japanese
Submission Number: 2872
Loading