Abstract: Retrieval Augmented Generation (RAG) is
a commonly used approach for enhancing
large language models (LLMs) with relevant
and up-to-date information. However, the retrieved sources can often contain conflicting
information and it remains unclear how models should address such discrepancies. In this
work, we first propose a novel taxonomy of
knowledge conflict types in RAG, along with
the desired model behavior for each type. We
then introduce
CONFLICTS, a high-quality
benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables
tracking progress on how models address a
wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle
to appropriately resolve conflicts between
sources. While prompting LLMs to explicitly reason about the potential conflict in the
retrieved documents significantly improves
the quality and appropriateness of their responses, substantial room for improvement
in future research remains.
Loading