The Syntactic Productivity of Large Language Models

The Syntactic Productivity of Large Language Models

ACL ARR 2025 May Submission5122 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Do Large Language Models (LLMs) produce output that exhibits syntactic productivity similar to human language? Although recent work has focused on quantifying the lexical, ngram or templatic novelty of LLMs with respect to their training data, we posit the problem is formally equivalent to a major issue in child language research where conclusions must be drawn about the underlying grammar solely on the basis of a child's production data. We apply a mathematically rigorous and independently validated measure of Syntactic Productivity--the combinatorial diversity of Determiner-Noun (DxN) pairs used to measure young children's developing grammars--to four OpenAI LLMs whose training data is inaccessible. We find children, their caretakers and professional writers show the statistical hallmark of Syntactic Productivity but LLM-generated texts do not.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: Syntactic diversity, syntactic productivity, combinatorial diversity, Zipf's law, LLMs, LLM-generated text, language acquisition, overlap

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Submission Number: 5122

Loading