Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding

ICLR 2026 Conference Submission9456 Authors

17 Sept 2025 (modified: 26 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: token_level, hallucination control, self checking
TL;DR: Token-Guard applies self-checking decoding for token-level hallucination control, enhancing LLM quality and reliability.
Abstract: Large Language Models (LLMs) often hallucinate, generating content inconsistent with the input. Retrieval-Augmented Generation (RAG) and Reinforcement Learning with Human Feedback (RLHF) can mitigate hallucinations but require resource-intensive retrieval or large-scale fine-tuning. Decoding-based methods are lighter yet lack explicit hallucination control. To address this, we present \textbf{Token-Guard}, a token-level hallucination control method based on self-checking decoding. Token-Guard performs internal verification at each reasoning step to detect hallucinated tokens before they propagate. Candidate fragments are further evaluated in a latent space with explicit hallucination risk scoring, while iterative pruning and regeneration dynamically correct detected errors. Experiments on HALU datasets show Token-Guard substantially reduces hallucinations and improves generation accuracy, offering a scalable, lightweight solution for reliable LLM outputs. Our code is publicly available\footnote{Anonymous Github Link: \url{https://anonymous.4open.science/r/Token_Guard-00C3}}.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 9456
Loading