Quantifying Genuine Awareness in Hallucination Prediction: Disentangling Question-Side Shortcuts

Yeongbin Seo; Dongha Lee; Jinyoung Yeo

Quantifying Genuine Awareness in Hallucination Prediction: Disentangling Question-Side Shortcuts

Yeongbin Seo, Dongha Lee, Jinyoung Yeo

Published: 02 Mar 2026, Last Modified: 31 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hallucination, LLM, self-awareness

TL;DR: We use a Shapley-based AQE to separate question-side shortcuts from model-side self-awareness in hallucination prediction, showing shortcut-heavy detectors fail under shift.

Abstract: Many works have proposed methodologies for language model (LM) hallucination detection and reported seemingly strong performance. However, we argue that the reported performance to date reflects not only a model’s genuine awareness of its internal information, but also awareness derived purely from question-side information (e.g., benchmark hacking). While benchmark hacking can be effective for boosting hallucination detection score on existing benchmarks, it does not generalize to out-of-domain settings and practical usage. Nevertheless, disentangling how much of a model’s hallucination detection performance arises from question-side awareness is non-trivial. To address this, we propose a methodology for measuring this effect without requiring human labor, Approximate Question-side Effect (AQE). Our analysis using AQE reveals that existing hallucination detection methods rely heavily on benchmark hacking.

Submission Number: 42

Loading