Reproducibility study of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

TMLR Paper4320 Authors

22 Feb 2025 (modified: 10 Jul 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper presents a reproducibility study of Ortu et al. (2024), investigating the competition of the factual recall and counterfactual in-context adaptation mechanisms in GPT-2. We extend experiments developed by the original authors with softmax-normalized logits as another metric for gauging the evolution of the scoring of tokens in the model. Our reproduced and extended experiments validate the original paper's main claims regarding the location of the competition of mechanisms in GPT-2, i.e. that the competition emerges predominantly in later layers, and is driven by the attention blocks corresponding to a subset of specialized attention heads. Additionally, we explore intervention strategies based on attention modification to increase factual accuracy. We find that boosting multiple attention heads involved in factual recall simultaneously can have a synergistic effect on factual accuracy, which is further enhanced by the suppression of copy heads. Finally, we rework how the competition of mechanisms is conceptualized and find that the specialized factual recall heads identified by Ortu et al. (2024) act as copy regulators, penalizing counterfactual in-context adaptation and rewarding the copying of factual information.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Removed colon from title. - De-anonymized, camera-ready version.
Assigned Action Editor: ~Alberto_Bietti1
Submission Number: 4320
Loading