Abstract: In order to characterize the linguistic properties of AI-generated text, we ask: Do Large Language Models (LLMs) produce output that exhibit syntactic properties similar to human language? The problem is formally equivalent to a major issue in child language research where conclusions must be drawn about the underlying grammar solely on the basis of a child's production data. We apply a mathematically rigorous and independently validated benchmark to quantify the syntactic productivity with specific focus on Determiner-Noun (DxN) combinations. Human language corpora show the statistical profile of syntactic productivity but LLM-generated texts do not.
Paper Type: Short
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: linguistic theories, benchmarking, evaluation, metrics
Contribution Types: Theory
Languages Studied: English
Submission Number: 6339
Loading