Structural sensitivity does not entail grammaticality: assessing LLMs against UHFH

Published: 03 Oct 2025, Last Modified: 13 Nov 2025CPL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: grammaticality, syntax, llms, psycholinguistics
TL;DR: The study explores whether Large Language Models (LLMs) can generalize syntactic rules, taking into account the universal hierarchy of functional heads (UHFH).
Abstract: The study explores whether Large Language Models (LLMs) can generalize the universal hierarchy of functional heads (UHFH), a cross-linguistic syntactic pattern argued to be rooted in human cognition (Cinque 1999). We focus on Italian Restructuring Verbs (RVs). Four LLMs were evaluated: Mistral-7B-v0.3, GPT2-small, GePpeTto, and Minerva-7B-base-v1.0. GePpeTto and Minerva-7B-base-v1.0 were trained on Italian corpora, while Mistral-7B-v0.3 and GPT2-small were primarily trained on English data. Results suggest that hierarchical awareness acts more as a heuristic than as a diagnostic of grammatical well-formedness. Notably, GePpeTto, a smaller GPT-2- style model trained on Italian data, outperformed all larger models, including 7B-parameter Mistral and Minerva architectures (Tab.1). This finding challenges the common assumption that larger size universally yields better syntactic abstraction and suggests that language-specific training and vocabulary alignment may play a more decisive role in domains relying on typologically grounded syntactic contrasts. The study concludes that LLMs can learn structural tendencies aligned with linguistic hierarchies, but these are not necessarily associated with a proper grammaticality judgment
Submission Number: 56
Loading