Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

ACL ARR 2025 May Submission4096 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models (LLMs). We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting, which leverages a topological divergence metric to quantify the structural properties of graphs induced by attention matrices. Examining the topological divergence between prompt and response subgraphs reveals consistent patterns: higher divergence values in specific attention heads correlate with hallucinated outputs, independent of the dataset. Extensive experiments -- including evaluation on question answering and summarization tasks -- show that our approach achieves state-of-the-art or competitive results on several benchmarks while requiring minimal annotated data and computational resources. Our findings suggest that analyzing the topological structure of attention matrices can serve as an efficient and robust indicator of factual reliability in LLMs.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: uncertainty, robustness

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Theory

Languages Studied: English

Submission Number: 4096

Loading