The Hallucination Detection Challenge: How Large Language Models Fail to Recognise Their Own Fabrications
Abstract: Large Language Models (LLMs) which are capable of generating human-like, fluent text, are increasingly involved in the text editing process by humans, leading to a growing number of texts co-authored by LLMs and humans. This raises the question of whether LLMs can assess the factuality of texts co-authored by LLMs and humans. In this paper, we have created a binary dataset composed of texts co-authored by LLMs and humans. The dataset utilizes an automated generation approach, allowing for the easy expansion of the dataset's size, and it features a high degree of similarity between positive and negative examples, which increases the difficulty of model inference on this dataset. After observing that the performance of LLMs on this dataset did not meet expectations, we introduced a confidence score for the output results of LLMs based on their output consistency, thereby significantly enhancing the precision of the model's predictive results.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Large Language Model, Hallucination Detection, Interpretability of LLMs, LLM Self-Verification
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 5779
Loading