Reinforced Lifelong Editing for Language Models

Zherui Li; Houcheng Jiang; Hao Chen; Baolong Bi; Zhenhong Zhou; Fei Sun; Junfeng Fang; Xiang Wang

Reinforced Lifelong Editing for Language Models

Zherui Li, Houcheng Jiang, Hao Chen, Baolong Bi, Zhenhong Zhou, Fei Sun, Junfeng Fang, Xiang Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose RLEdit, a hypernetwork-based lifelong model editing method that achieves both effectiveness and efficiency.

Abstract: Large language models (LLMs) acquire information from pre-training corpora, but their stored knowledge can become inaccurate or outdated over time. Model editing addresses this challenge by modifying model parameters without retraining, and prevalent approaches leverage hypernetworks to generate these parameter updates. However, they face significant challenges in lifelong editing due to their incompatibility with LLM parameters that dynamically change during the editing process. To address this, we observed that hypernetwork-based lifelong editing aligns with reinforcement learning modeling and proposed **RLEdit**, an RL-based editing method. By treating editing losses as rewards and optimizing hypernetwork parameters at the full knowledge sequence level, we enable it to precisely capture LLM changes and generate appropriate parameter updates. Our extensive empirical evaluation across several LLMs demonstrates that RLEdit outperforms existing methods in lifelong editing with superior effectiveness and efficiency, achieving a **59.24%** improvement while requiring only **2.11%** of the time compared to most approaches.

Lay Summary: Large Language Models (LLMs) learn vast amounts of information. However, this knowledge can become outdated or incorrect over time, and constantly retraining them from scratch is expensive and time-consuming. To address this, researchers use "model editing" to directly tweak LLMs' internal settings. Many current editing tools, however, struggle when making continuous updates because the LLM's internal state changes with each edit. Our research introduces a new method called **RLEdit**. We noticed that this challenge of ongoing editing is similar to how LLM can learn from rewards. RLEdit uses this reinforcement learning concept to better track the LLM's changes and make more precise updates. When tested, RLEdit was significantly more effective and much faster at these lifelong edits compared to existing methods, showing a major improvement in updating accuracy while requiring only a small fraction of the time. This helps keep LLMs more accurate and up-to-date efficiently.

Primary Area: Deep Learning->Large Language Models

Keywords: Model Editing, Meta-learning, Large Language Model

Submission Number: 5585

Loading