REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

ACL ARR 2025 February Submission6136 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language model editing methods frequently encounter overfitting, wherein factual updates disproportionately influence the model's broader behavior, causing it to adhere rigidly to the edited target regardless of the query context. To address this challenge, we introduce \textbf{REACT} (\underline{R}epresentation \underline{E}xtraction \underline{A}nd \underline{C}ontrollable \underline{T}uning), a dual-phase framework designed for precise and scalable knowledge editing. In the initial phase, we utilize tailored stimuli with Principal Component Analysis to extract latent factual representations and derive a directional “belief shift” vector. In the subsequent phase, a pre-trained classifier guides the selective perturbation of hidden states via a learned scalar, ensuring that modifications remain confined to relevant regions of the latent space. This strategy is further refined through a composite loss function that balances editing and localization objectives, ultimately integrating new information effectively while preserving unrelated knowledge. Empirical evaluations on COUNTERFACT, MQuAKE, and EVOKE benchmarks demonstrate that \textbf{REACT} significantly mitigates overfitting and enhances reliability, portability, and generality across diverse editing scenarios.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Knowledge Tracing/Discovering/Inducing, Probing, Robustness
Languages Studied: English
Submission Number: 6136
Loading