HLB: Benchmarking LLMs’ Humanlikeness in Language Use

HLB: Benchmarking LLMs’ Humanlikeness in Language Use

ACL ARR 2025 May Submission7996 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As synthetic data becomes increasingly prevalent in training large language models (LLMs), concerns have emerged that these models may deviate from authentic human language patterns, potentially losing the richness and creativity inherent in human communication. This highlights the critical need to assess the language humanlikeness of LLMs in real-world usage. In this paper, we present a comprehensive \textbf{H}uman language \textbf{L}ikeness \textbf{B}enchmark (HLB)—a comprehensive evaluation of 20 LLMs using psycholinguistic experiments designed to probe core linguistic dimensions: phonology, lexical processing, syntax, semantics, and discourse. To contextualize model performance, we collected responses from over 2,000 human participants as a baseline and compared these to the outputs generated by the models. For rigorous evaluation, we developed a coding algorithm that accurately identified language use patterns, enabling the extraction of response distributions for each task. By comparing the response distributions between human participants and LLMs, we quantified humanlikeness through distributional similarity. Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels. Importantly, we found that improvements in other performance metrics did not necessarily lead to greater humanlikeness, and in some cases, even resulted in a decline. By introducing psycholinguistic methods to model evaluation, this benchmark offers the first framework for systematically assessing the humanlikeness of LLMs in language use (see Figure 19 for the leaderboard; Code and data will be released upon acceptance.)

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: computational psycholinguistics, cognitive modeling

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 7996

Loading