Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study.

Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study.

ACL ARR 2025 February Submission3889 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) are applied to diverse contexts of our lives, including the implementation in child education. Here, we evaluate the ability of an LLM to generate child-like language by comparing an LLM-based corpus to the Litkey Corpus, a collection of German children's writings based on picture stories. We generated a parallel LLM-based corpus using identical visual prompts and conducted a comparative analysis across word frequency distributions, lexical richness, and semantic representations. This study aims to explore if and how children and LLMs differ in psycholinguistic aspects of the text to evaluate the potential influences of LLM text on child development. The results show that, while the LLM-based texts are longer, the vocabulary is less rich, has more letters, and misses words in medium- and low-frequency ranges (i.e., uses primarily words that often occur). Additionally, vector space analysis using semantic word embeddings reveals a low semantic similarity, highlighting differences between the two corpora on the level of corpus semantics. These findings contribute to our understanding of LLM-generated language and its limitations in modeling child language, with implications for LLM usage in psycholinguistics and educational applications.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: linguistic theories, cognitive modeling, computational psycholinguistics

Contribution Types: Model analysis & interpretability

Languages Studied: German

Submission Number: 3889

Loading