Keywords: Probing, Other
Other Keywords: Cognitive Science, Active Memory Search
TL;DR: We show semantic foraging mechanisms, critical to human performance in active memory search tasks, emerge as identifiable patterns in LLMs.
Abstract: Both humans and Large Language Models (LLMs) store a vast repository of
semantic memories. In humans, efficient and strategic access to this memory
store is a critical foundation for a variety of cognitive functions. Such access
has long been a focus of psychology and the computational mechanisms behind
it are now well characterized. Much of this understanding has been gleaned
from a widely-used neuropsychological and cognitive science assessment called
the Semantic Fluency Task (SFT), which requires the generation of as many
semantically constrained concepts as possible. Our goal is to apply mechanistic
interpretability techniques to bring greater rigor to the study of semantic memory
foraging in LLMs. To this end, we present preliminary results examining SFT as a
case study. A central focus is on convergent and divergent patterns of generative
memory search, which in humans play complementary strategic roles in efficient
memory foraging. We show that these same behavioral signatures, critical to human
performance on the SFT, also emerge as identifiable patterns in LLMs across
distinct layers. Potentially, this analysis provides new insights into how LLMs may
be adapted into closer cognitive alignment with humans, or alternatively, guided
toward productive cognitive disalignment to enhance complementary strengths in
human–AI interaction.
Submission Number: 277
Loading