Abstract: Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge.
Previous works attribute this conflict to the interplay between "memory heads" and "context heads", attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the *superposition of contextual information and parametric memory*, where highly influential attention heads simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JuICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JuICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JuICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and the superposition of contextual information and parametric memory in attention heads, which further elucidates the effectiveness of JuICE in these settings. Our code is available at https://github.com/GaotangLi/JUICE.
Lay Summary: Language models frequently encounter "knowledge conflicts," where their pre-trained knowledge contradicts the information provided by specific contexts. These conflicts often arise in context-dependent systems, such as retrieval-augmented generation and tools integrated with language models. But what exactly happens inside the model during these conflicts? Can we find solutions by examining the model’s internal mechanisms?
Our research uncovers an unexpected phenomenon called superposition of contextual information and parametric memory, where critical components (attention heads specifically) simultaneously influence both stored knowledge and contextual information, without clearly favoring one over the other. Through rigorous empirical and theoretical studies, using carefully designed synthetic datasets, we validate the presence and implications of this superposition phenomenon.
Building upon these findings, we introduce a lightweight training-free method to reliably steer language models toward using either their pre-existing knowledge or context-specific information, depending on what is required. Our detailed analysis approach provides a rigorous and unified framework for future research on knowledge conflicts. Furthermore, our insights relate to many concurrent works supporting superposition and offer an effective and practical intervention method against superposition, which could potentially benefit other tasks and help overcome a significant obstacle in the interpretability community.
Link To Code: https://github.com/GaotangLi/JUICE
Primary Area: Deep Learning->Large Language Models
Keywords: Knowledge Conflict, Mechanistic Interpretability, Science of Large Language Models
Submission Number: 2610
Loading