Large Language Models are Human-Like Internally

Published: 03 Oct 2025, Last Modified: 13 Nov 2025CPL 2025 TalkEveryoneRevisionsBibTeXCC BY 4.0
Keywords: online sentence processing, surprisal theory, language models, LLM interpretability
Abstract: This study extends the scope of LM-based cognitive modeling into the neural LMs' model internals, while existing studies have exclusively focused on the final probability output of the model. Given the techniques to extract surprisal from internal layers in the mechanistic interpretability field in natural language processing, we systematically evaluate the fit of the surprisal from internal layers with human measures, using 30 LMs and 15 datasets where human reading behavior/physiology data are recorded. The results show new findings, for example, the final layer's surprisal is typically a worse indicator to predict human reading behavior than earlier layers, and through the lens of internal layer's surprisal, larger language models seem more human-like than previously expected.
Submission Number: 59
Loading