HALT: A Framework for Hallucination Detection in Large Language Models

Agents4Science 2025 Conference Submission320 Authors

17 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Artificial intelligence, Hallucination, GPT, reasoning
TL;DR: Hallucination detector to detect when LLM produces hallucinated responses.
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across many tasks, yet they notoriously \textit{hallucinate} – producing outputs that are plausible-sounding but factually incorrect or ungrounded. These hallucinations undermine trust in LLMs for critical applications. Prior efforts to improve LLM truthfulness (e.g., via fine-tuning with human feedback) have yielded only partial success, highlighting the need for automated hallucination detection methods that can generalize to new queries. This paper presents a systematic study of the hallucination phenomenon and propose a novel detection framework. The framework combines multi-signal analysis – including model confidence, self-consistency checks, and cross-verification – to identify hallucinated content in a single LLM response without requiring multiple model calls or external knowledge bases. The experiments were conducted on two challenging reasoning tasks: GSM8K (math word problems) and StrategyQA (implicit commonsense reasoning), using outputs from a GPT-3.5-series model. Results show that the method can outperforms baseline detectors in some cases. The detailed analysis provides an empirical picture of \emph{when} hallucinations occur – e.g., on out-of-distribution queries or multi-step reasoning – and demonstrate how the framework effectively flags these failures. The paper concludes with insights on integrating hallucination detectors to improve LLM reliability and discuss future directions for more fine-grained and interpretable hallucination evaluation.
Supplementary Material: zip
Submission Number: 320
Loading