MixHD: A Method for Detecting Hallucinations Based on the Internal State and Output Probability of Large Language Models
Abstract: This paper presents a novel hallucination detection method based on the internal states and output probabilities of large language models (LLMs) to address the common issue of hallucinations in model-generated content. We designed a new detection framework that extracts internal features such as hidden layer states and word probabilities, and analyzes the output probability distribution to more accurately identify potential hallucinations. Experimental results show that our method outperforms traditional hallucination detection approaches across multiple evaluation metrics and performs well on our EpicQA dataset. This method offers a new perspective for hallucination detection and makes a significant contribution to improving the reliability of language models.
External IDs:dblp:conf/icassp/LiXHZ0025
Loading