Keywords: Large Language Models, Mechanistic Interpretability, Factual Recall, Transformer Circuits, Knowledge Retrieval
Abstract: Large language models (LLMs) store and recall factual knowledge, yet the precise mechanism of how entity representations are transformed to enable specific attribute retrieval remains underexplored. In this work, we investigate this mechanism through the lens of an ``attribute-computation path''—a sequence of computational steps over the entity representation required to elicit a target attribute. We then propose an iterative patching protocol to identify a minimal subset of layers necessary for this computation. Applying our method to LLaMA 3.1 8B and Qwen 3 8B, we find that these paths are non-contiguous, often skipping layers, and that models possess multiple, functionally-equivalent paths for the same entity and fact, highlighting a high degree of redundancy in attribute computation. This implies that knowledge computation is highly distributed, potentially explaining the localization-editing mismatch and suggesting that knowledge storage and retrieval in LLMs is far from being well understood.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: knowledge tracing/discovering/inducing, counterfactual/contrastive explanations
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 5395
Loading