Keywords: Knowledge Conflict, Retrieval-Augmented Generation, Representation Editing
Abstract: Large Language Models (LLMs) may not always provide accurate responses to user queries, owing to the staleness of training data and the presence of noise. To address this, Retrieval-Augmented Generation (RAG) has been widely adopted, enabling LLMs to ground their responses in external knowledge sources. Nonetheless, recent studies show that conflicts between the retrieved external knowledge and the model’s parametric knowledge can lead to hallucinatory outputs, and this problem is exacerbated when the retrieved documents contain noise. In this work, we propose Conflict-Aware Representation Editing (CARE), a method designed to generate robust responses even when the retrieval includes documents with low relevance to the query. CARE aims to produce conflict-resilient responses by editing the internal representations of the model. Assuming that LLMs encode distinguishable internal patterns indicative of knowledge conflicts, we introduce an autoencoder into the model’s internal layers to identify such regions. We then modulate neuron activations accordingly, steering the model to generate responses unaffected by knowledge conflicts. We evaluate CARE across six Question Answering (QA) benchmarks and four LLMs, demonstrating its superior performance over existing methods.
Primary Area: interpretability and explainable AI
Submission Number: 6885
Loading