ChainGuard: Entropy-Guided Intervention for Reducing Hallucinations in Chain-of-Thought Reasoning

10 Feb 2026 (modified: 01 May 2026)Submitted to P-AGIEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Technical Foundations for a Post-AGI World
Keywords: hallucination detection, chain-of-thought reasoning, semantic entropy, LLM safety, intervention strategies
TL;DR: Semantic entropy over sampled continuations detects uncertain chain-of-thought steps and guides targeted retry interventions that reduce hallucinations by 84.6%.
Abstract: Large language models frequently hallucinate during chain-of-thought (CoT) reasoning, undermining reliability in critical applications. We present ChainGuard, a framework that uses semantic entropy—computed by clustering Sentence-BERT embeddings of sampled continuations via DBSCAN—to detect and correct hallucinated reasoning steps. Across 53 CoT examples from TruthfulQA and HaluEval, hallucinated answers exhibit higher mean entropy than correct ones ($3.15$ vs. $2.63$), and an entropy-guided retry intervention on 13 high-entropy cases ($H \geq 3.0$) reduces the hallucination rate from 100% to 15.4%. Although the overall point-biserial correlation is modest ($\rho = 0.15$, $p = 0.28$), the strong intervention results demonstrate that semantic entropy can serve as a practical triggering signal for improving CoT reliability. We release our dataset, entropy toolkit, and intervention framework to support further research.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Zhenhao_Wang3
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 33
Loading