Keywords: LLM Personalization, Mechanistic Interpretability, Knowledge Graph
Abstract: As large language models (LLMs) become central to user-facing applications, effective personalization, adapting models to individual users’ evolving facts and contexts, has become crucial. However, existing approaches struggle with mutable personal knowledge: finetuning can embed static user information but is costly and prone to catastrophic forgetting, while knowledge editing methods rely on pre-cached representations from large corpora like Wikipedia, which are unavailable or unsuitable for personal domains due to data scarcity and privacy concerns. We formalize updating the fact-level personalization with mutable knowledge as a new task, constructing synthetic Personal Knowledge Graphs (PKGs) that capture user information across time points to evaluate models' ability to incorporate updates without degrading existing knowledge. Drawing on insights from mechanistic interpretability, we discover that personal facts are encoded in localized circuits within LLMs.
We propose SPIKE (Steering for Personalized Knowledge Injection), a parameter-efficient method that combines adapter modules with steering-based activation injection, targeting identified personal knowledge circuits. This approach enables the precise integration of new user-specific facts, including previously unseen triples, while maintaining the integrity of prior knowledge. Our experiments demonstrate that SPIKE effectively balances the accuracy of incorporating new facts with the preservation of existing knowledge, offering a practical solution for continual personalization in settings where user information evolves frequently.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18321
Loading