Efficient Hallucination Detection for LLMs Using Uncertainty-Aware Attention Heads
Keywords: Hallucination detection, Large language models, Uncertainty quantification, Selective generation, Attention mechanisms
TL;DR: We introduce a new unsupervised method for hallucination detection for large language models, which integrates attention weights and token probabilities.
Abstract: While large language models (LLMs) have become highly capable, they remain prone to factual inaccuracies, commonly referred to as \``hallucinations.''
Uncertainty quantification (UQ) offers a promising way to mitigate this issue, but most existing methods are computationally intensive and/or require supervision. In this work, we propose Recurrent Attention-based Uncertainty Quantification (RAUQ), an unsupervised and efficient framework for identifying hallucinations. The method leverages an observation about transformer attention behavior: when incorrect information is generated, certain \``uncertainty-aware'' attention heads tend to reduce their focus on preceding tokens. RAUQ automatically detects these attention heads and combines their activation patterns with token-level confidence measures in a recurrent scheme, producing a sequence-level uncertainty estimate in just a single forward pass. Through experiments on twelve tasks spanning question answering, summarization, and translation across four different LLMs, we show that RAUQ consistently outperforms state-of-the-art UQ baselines.
Importantly, it incurs minimal overhead, requiring less than 1\% additional computation.
Since it requires neither labeled data nor extensive parameter tuning, RAUQ serves as a lightweight, plug-and-play solution for real-time hallucination detection in white-box LLMs.
Submission Number: 48
Loading