Keywords: Span-Level Hallucination Detection, Automated Dataset Construction, Explainability
Abstract: Effective span-level hallucination detection is crucial for the trustworthy deployment of Large Language Models (LLMs), relying on high-quality datasets as its foundational prerequisite. To alleviate the high cost of span-level manual annotation, prior work has turned to automatic dataset construction. Existing automatically constructed span-level datasets are typically created by synthesizing hallucinated claims via span perturbations with label annotations. However, such synthetic hallucinated claims often contain lots of surface-level flaws such as disfluency, incoherence, or inconsistency, which differ from the fluent, coherent, consistent yet factually incorrect hallucinations produced by real-world LLMs; meanwhile, label-only annotations lack explainability and practical utility. To tackle the above problems, in this paper, we propose a novel self-refinement framework designed to generate more realistic hallucinated claims and explainable annotations by leveraging structured hallucination-inducing edit suggestions and a generate–critique–refine loop mechanism, resulting in a high-quality span-level dataset with explainable annotations, named ExpHul. Moreover, we train a compact hallucination detection model on ExpHul. Extensive experiments show that the detection model outperforms state-of-the-art baselines while providing high-quality explanations and revisions. Furthermore, the 8B-parameter detection model achieves competitive performance with the best closed-source LLMs like Gemini-3.0-Pro and GPT-5.2 at a lower inference cost.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: fact checking, rumor/misinformation detection
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 9705
Loading