CEDA: Cross-modal Evaluation through Debate Agents for Robust Hallucination Detection

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY-ND 4.0
Keywords: LLM-as-a-Judge, Retrieval Augmented Generation, Multi-Agent Debate, Hallucination Detection, Uncertainty Quantification
TL;DR: CEDA introduces a novel framework for detecting hallucinations in language models by combining structured agent debates with calibrated classification, achieving state-of-the-art performance while providing explanations and confidence scores.
Abstract: We present CEDA, a novel multimodal framework for detecting hallucinations in large language model outputs through a multi-agent debate approach. While existing methods for black-box LLMs often rely on response sampling and self-consistency checking, our framework leverages a three-fold approach: a multi-agent debate setting to critically examine and debate the authenticity of generated content, a lightweight classifier head on top of LLM-as-a-judge for more calibrated detection, and a confidence estimation to quantify uncertainty in hallucination detection. This debate-based architecture enables a more nuanced and contextual evaluation of potential hallucinations across multiple modalities. Through extensive experiments on five different benchmarks - TruthfulQA, Natural Questions, FEVER, HallusionBench and HaluEval Summarisation, we demonstrate that our approach achieves significant improvements over baseline methods. Our framework also provides interpretable debate traces that help explain the reasoning behind hallucination determinations. These results suggest that structured multi-agent debate systems offer a promising direction for improving the reliability and trustworthiness of language model outputs.
Submission Number: 6
Loading