Keywords: Knowledge Conflict, Context-memory Conflict, Model-Based Evaluation
Abstract: Large language models (LLMs) rely on both contextual knowledge and parametric memory, yet these sources can conflict.
Prior analysis largely focused on contextual question answering, suggesting that models tend to favor parametric knowledge under conflict, but this setting assumes that tasks should always rely on the provided passage.
It therefore remains unclear how LLMs behave when \emph{tasks demand different kinds and degrees of knowledge utilization}.
We address this gap with a model-agnostic diagnostic framework that holds underlying knowledge constant while injecting controlled conflicts across tasks with varying knowledge requirements.
Evaluating representative open-source LLMs, we find that:
(1) performance degradation under conflict correlates with a task’s knowledge reliance rather than conflict plausibility alone;
(2) strategies such as explanatory rationales or reiteration increase context reliance, helping context-only tasks but harming those that require parametric knowledge; and
(3) these behaviors bias model-based evaluation, raising concerns about the reliability of LLMs as judges.
Together, our findings show that context–memory conflict is fundamentally task-dependent and motivate task-aware approaches to balancing context and memory in LLM deployment and evaluation.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Interpretability and Analysis of Models for NLP, Resources and Evaluation, Generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 6214
Loading