Do All Autoregressive Transformers Remeber Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms

ACL ARR 2025 May Submission6461 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Understanding how Transformer-based language models store and retrieve factual associations is critical for improving interpretability and enabling targeted model editing. Prior work, primarily on GPT-style models, has identified MLP modules in early layers as key contributors to factual recall. However, it remains unclear whether these findings generalize across different autoregressive architectures. To address this, we conduct a comprehensive evaluation of factual recall across several models---including GPT, LLaMA, Qwen, and DeepSeek---analyzing where and how factual information is encoded and accessed. Consequently, we find that Qwen-based models behave differently from previous patterns: attention modules in the earliest layers contribute more to factual recall than MLP modules. Our findings suggest that even within the autoregressive Transformer family, architectural variations can lead to fundamentally different mechanisms of factual recall.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Knowledge tracing
Contribution Types: Model analysis & interpretability, Reproduction study
Languages Studied: English
Submission Number: 6461
Loading