HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

ICLR 2026 Conference Submission14994 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Hallucination Detection

Abstract: The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: *data-driven hallucinations* and *reasoning-driven hallucinations*. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the *Hallucination Risk Bound*, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce **HalluGuard**, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We validate **HalluGuard** through extensive experiments across 10 benchmarks, 9 LLM backbones, and 11 state-of-the-art detectors, consistently demonstrating its efficacy.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 14994

Loading