Keywords: large language models, hallucination detection
Abstract: Diffusion large language models (D-LLMs) have recently emerged as a promising alternative to auto-regressive LLMs (AR-LLMs). However, the hallucination problem in D-LLMs remains underexplored, limiting their reliability in real-world applications. Existing hallucination detection methods are designed for AR-LLMs and rely on signals from \emph{single-step} generation, making them ill-suited for D-LLMs where hallucination signals often emerge throughout the \emph{multi-step} denoising process. To bridge this gap, we propose \textbf{TraceDet}, a novel framework that explicitly leverages the intermediate denoising steps of D-LLMs for hallucination detection. TraceDet models the denoising process as an \emph{action trace}, with each action defined as the model’s prediction over the cleaned response, conditioned on the previous intermediate output. By identifying the sub-trace that is maximally informative to the hallucinated responses, TraceDet leverages the key hallucination signals in the multi-step denoising process of D-LLMs for hallucination detection. Extensive experiments on various open source D-LLMs demonstrate that \textbf{TraceDet} consistently improves hallucination detection, achieving an average gain in AUROC of 15. 2\% compared to baselines.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13302
Loading