Reproducibility study of: "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

Reproducibility study of: "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

TMLR Paper4320 Authors

22 Feb 2025 (modified: 10 Jun 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper presents a reproducibility study of Ortu et al. (2024), investigating the competition of the factual recall and counterfactual in-context adaptation mechanisms in GPT-2. We extend experiments developed by the original authors with softmax-normalized logits as another metric for gauging the evolution of the scoring of tokens in the model. Our reproduced and extended experiments validate the original paper's main claims regarding the location of the competition of mechanisms in GPT-2, i.e. that the competition emerges predominantly in later layers, and is driven by the attention blocks corresponding to a subset of specialized attention heads. Additionally, we explore intervention strategies based on attention modification to increase factual accuracy. We find that boosting multiple attention heads involved in factual recall simultaneously can have a synergistic effect on factual accuracy, which is further enhanced by the suppression of copy heads. Finally, we find that the specialized factual recall heads identified by Ortu et al. (2024) act as copy regulators, penalizing counterfactual in-context adaptation and rewarding the copying of factual information.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - Added a file to the GitHub Repository (changes.md) containing all the changes we made to run the original code. - We move what was Section 4.9 (Baselines for factual boosting) to an appendix. While there is value this experiment as it proposes a new way to talk about the uniqueness of the roles of attention heads, we agree that the insight that attention heads can take on different roles has been sufficiently explored in other works. Note that this means that what was previously Section 4.10 is now Section 4.9 and take that into account when reading this response. - Section 4.6 has been revised to be clearer, and is concluded by stating softmax-normalization is not directly used in the rest of the paper (as the conclusions made in this section largely mirror those using raw logits). - We introduce new names in Section 3.4 to better differentiate between the two datasets used in the (softmax-normalized) logit inspection and attention modification experiments. We call these CF-Juxtaposed (the dataset provided by Ortu et al.) and CF-Tracing (a dataset developed by Neel Nanda). - In Section 4.9 we clarify that the purpose of the experiment is to isolate the effects of the factual recall mechanism and explain the impact of our results on whether such a mechanism exists independently of a copy mechanism more critically. - We added an appendix showing a grid search of suppression values as a sanity check.

Assigned Action Editor: ~Alberto_Bietti1

Submission Number: 4320

Loading