The Hallucination Detection Challenge: How Large Language Models Fail to Recognise Their Own Fabrications

The Hallucination Detection Challenge: How Large Language Models Fail to Recognise Their Own Fabrications

ACL ARR 2025 May Submission5779 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) which are capable of generating human-like, fluent text, are increasingly involved in the text editing process by humans, leading to a growing number of texts co-authored by LLMs and humans. This raises the question of whether LLMs can assess the factuality of texts co-authored by LLMs and humans. In this paper, we have created a binary dataset composed of texts co-authored by LLMs and humans. The dataset utilizes an automated generation approach, allowing for the easy expansion of the dataset's size, and it features a high degree of similarity between positive and negative examples, which increases the difficulty of model inference on this dataset. After observing that the performance of LLMs on this dataset did not meet expectations, we introduced a confidence score for the output results of LLMs based on their output consistency, thereby significantly enhancing the precision of the model's predictive results.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Large Language Model, Hallucination Detection, Interpretability of LLMs, LLM Self-Verification

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 5779

Loading