Initializing and Retrofitting Key-Value Adaptors for Traceable Model Editing

Initializing and Retrofitting Key-Value Adaptors for Traceable Model Editing

ACL ARR 2025 February Submission1720 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As the insight of knowledge storage in language models deepens, the ability to perform CRUD (Create, Read, Update, Delete) operations on language models becomes increasingly indispensable for satisfying the demands of managing rapidly updating knowledge. Considering the high cost of fine-tuning language models, model editing methods with low cost are usually required to manipulate models' knowledge. The evidence suggests that modules carrying knowledge in a Transformer module are primarily the MLP blocks, thus we propose iReVa, a method that explicitly initializes and retrofits key-value pairs into MLP blocks to construct a new mapping of a piece of knowledge without damaging the irrelevant knowledge. In comparison to existing methods, iReVa reveals better interpretability and a stronger capacity for carrying traceable edits. Experiment results on a series of GPT series models show our prominent performance on edit success and generalization without influencing specificity. We also made the first attempt to conduct a knowledge withdrawal test of iReVa. Our codes are available at https://anonymous.4open.science/r/iReVa-6CFD.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: natural language processing, model editing, language model, key-value adaptor

Contribution Types: Theory

Languages Studied: English

Submission Number: 1720

Loading