A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

ACL ARR 2025 February Submission7037 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sometimes generate false or fabricated information. This presents a major challenge, as hallucinations often appear highly convincing and users generally lack the tools to detect them. Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs, aiding in the identification of potential hallucinations. In this work, we introduce pre-trained UQ heads: supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty compared to unsupervised UQ methods. Their strong performance stems from the powerful Transformer architecture in their design and informative features derived from LLM attention maps. Experimental evaluation shows that these heads are highly robust and achieve state-of-the-art performance in claim-level hallucination detection across both in-domain and out-of-domain prompts. Moreover, these modules demonstrate strong generalization to languages they were not explicitly trained on.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Uncertainty quantification, Hallucination detection, Large language models (LLMs), Attention mechanisms, Supervised learning, Cross-lingual generalization, Transformer architecture, Claim-level analysis, Pre-trained modules, Model interpretability

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English, Russian, Chinese, German

Submission Number: 7037

Loading