The Syntactic Productivity of Large Language Models

The Syntactic Productivity of Large Language Models

ACL ARR 2025 February Submission6339 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In order to characterize the linguistic properties of AI-generated text, we ask: Do Large Language Models (LLMs) produce output that exhibit syntactic properties similar to human language? The problem is formally equivalent to a major issue in child language research where conclusions must be drawn about the underlying grammar solely on the basis of a child's production data. We apply a mathematically rigorous and independently validated benchmark to quantify the syntactic productivity with specific focus on Determiner-Noun (DxN) combinations. Human language corpora show the statistical profile of syntactic productivity but LLM-generated texts do not.

Paper Type: Short

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: linguistic theories, benchmarking, evaluation, metrics

Contribution Types: Theory

Languages Studied: English

Submission Number: 6339

Loading