Keywords: Probing, Other
Other Keywords: Cognitive Science, Active Memory Search
TL;DR: We show semantic foraging mechanisms, critical to human performance in active memory search tasks, emerge as identifiable patterns in LLMs.
Abstract: Like humans, large language models store a vast repository of semantic memories. Efficient and strategic access to this memory store is a critical foundation for a variety of human cognitive functions. Therefore, it has been a research focus since the dawn of psychology and its computational mechanisms are well-characterized.
Much of this understanding has been gleaned from a widely-used neuropsychological and cognitive science assessment called the Semantic Foraging Task (SFT), which requires the generation of as many semantically constrained concepts as possible.
Our goal is to apply mechanistic interpretability techniques to bring greater rigor to the study of semantic memory foraging in LLMs. To this end, we present preliminary results examining SFT as a case study, analyzing how LLMs perform in comparison with humans. A central focus is on convergent and divergent patterns of generative memory search, which in humans play complementary strategic roles in efficient memory foraging. We show that these same behavioral signatures, critical to human performance on the SFT, also emerge as identifiable patterns in LLMs across distinct layers. Potentially, this analysis provides new insights into how LLMs may be adapted into closer cognitive alignment with humans, or alternatively, guided toward productive cognitive disalignment to enhance complementary strengths in human–AI interaction.
Submission Number: 277
Loading