An empirical study on modelling Working Memory constraints in Transformers

An empirical study on modelling Working Memory constraints in Transformers

ACL ARR 2026 January Submission10725 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cognitive Modelling, Attention Mechanisms, Inductive Bias, Psycholinguistics, Language Modelling

Abstract: We investigate the integration of human-like working memory constraints into the Transformer architecture and implement several cognitively inspired attention variants, including fixed-width windows based and temporal decay based attention mechanisms. Our modified GPT-2 models are trained from scratch on developmentally plausible datasets (10M and 100M words). Performance is evaluated on grammatical judgment tasks (BLiMP) and alignment with human reading time data. Our results indicate that these cognitively-inspired constraints, particularly fixed-width attention, can significantly improve grammatical accuracy especially when when training data is scarce. These constrained models also tend to show a stronger alignment with human processing metrics. The findings suggest that such constraints may serve as a beneficial inductive bias, guiding models towards more robust linguistic representations, especially in data-limited settings.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: cognitive modeling; computational psycholinguistics; language modelling;

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 10725

Loading