Tool-MAD: Dynamic Multi-Agent Debate Framework with Adaptive Retrieval for Accurate and Hallucination-Resistant Fact Verification

ACL ARR 2025 May Submission3187 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have achieved impressive results across a wide range of language tasks, but they still struggle with hallucinations and factual inaccuracies, particularly in complex reasoning and fact verification tasks. To address these limitations, we introduce Tool-MAD, a novel multi-agent debate (MAD) framework designed to enhance factual verification by equipping agents with external tools, including search APIs and Retrieval-Augmented Generation (RAG) modules. Tool-MAD incorporates two core innovations: (1) an adaptive query formulation mechanism that enables agents to iteratively refine evidence retrieval based on evolving debate contexts and prior arguments, and (2) a novel consistency score, which quantitatively assesses the semantic similarity between agents' responses and retrieved evidence, allowing the Judge agent to reliably detect hallucinations and improve factual alignment. Experimental results on four benchmark datasets for fact verification demonstrate that Tool-MAD consistently outperforms other multi-agent debate frameworks. Furthermore, in the medical question answering domain, Tool-MAD demonstrates strong robustness and flexibility across alternative tools and domain settings.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: LLM/AI agents,prompting,retrieval-augmented generation
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 3187
Loading