Moving the Eiffel Tower to ROME: Tracing and Editing Facts in GPTDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: We investigate the mechanisms underlying factual knowledge recall in auto-regressive transformer language models. To this end, we develop a method for identifying neuron activations that are capable of altering a model's factual predictions. Within GPT-2, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. Based on this insight, we propose ROME, a simple and efficient rank-one model editing method for rewriting abstract facts in auto-regressive language models. For validation, we introduce CounterFact, a dataset of over twenty thousand rewritable facts, as well as tools to facilitate sensitive measurements of edit quality. Compared to previously-published knowledge editing methods, ROME achieves superior generalization and specificity.
0 Replies

Loading