Initializing and Retrofitting Key-Value Adaptors for Traceable Model Editing

15 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: natural language processing, model editing, language model, key-value adaptor
Abstract: As the insight of knowledge storage in language models deepens, the ability to perform CRUD (Create, Read, Update, Delete) operations on language models becomes increasingly indispensable for satisfying the demands of managing rapidly updating knowledge. Considering the high cost of fine-tuning language models, model editing methods with low cost are usually required to manipulate models' knowledge. Evident suggests that modules carrying knowledge in a Transformer module are primarily the MLP blocks, thus we propose \textbf{iReVa}, a method that explicitly initializes and retrofits key-value pairs into MLP blocks to construct a new mapping of a piece of knowledge without damaging the irrelevant knowledge. In comparison to existing methods, iReVa reveals better interpretability and stronger capacity for carrying traceable edits. Experiment results on series of GPT series models show our prominent performance on edit success and generalization without influencing specificity. We also perform the first attempt at conducting knowledge withdrawal test of iReVa. Our codes are available at github.com/thartvigsen/grace.
Primary Area: Natural language processing
Submission Number: 15035
Loading