HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection

Published: 06 Sept 2025, Last Modified: 21 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: To mitigate the impact of hallucination nature of LLMs, many studies propose detecting hallucinated generation through uncertainty estimation. However, these approaches predominantly operate at the sentence or paragraph level, failing to pinpoint specific spans or entities responsible for hallucinated content. This lack of granularity is especially problematic for long-form outputs that mix accurate and fabricated information. To address this limitation, we explore entity-level hallucination detection. We propose a new data set, HalluEntity, which annotates hallucination at the entity level. Based on the dataset, we comprehensively evaluate uncertainty-based hallucination detection approaches across 17 modern LLMs. Our experimental results show that uncertainty estimation approaches focusing on individual token probabilities tend to over-predict hallucinations, while context-aware methods show better but still suboptimal performance. Through an in-depth qualitative study, we identify relationships between hallucination tendencies and linguistic properties and highlight important directions for future research. HalluEntity: https://huggingface.co/datasets/samuelyeh/HalluEntity
Certifications: J2C Certification
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank all reviewers for their insightful feedback. We have revised the paper accordingly, and the major changes are summarized below: - **[R1]** Add limitation about biography only - **[R1]** Explain the scenario of uncertain but correct generation in Sec. 5.3 - **[R1]** Add a description of the relation between FPR/FNR and Entropy to Sec. 5.3 - **[R1, R2]** Add limitation about errors/biases inhered from FActScore - **[R1, R2]** Add details of data quality assessment and report the result of multiple-run annotation and inner-annotator agreement to Appendix A - **[R2]** Modify the example in 3.1 to align with Figure 1 - **[R3]** Change the notations of entities, token-level labels, and entity-level labels - **[R3]** Remove the 2nd y-axis in Figure 2 and put percentage value inside the bars - **[R3]** Add discussion about proxy LLM All the added content has been marked as red. (* We refer to Reviewer cHGr as R1, Reviewer XtDP as R2, and Reviewer 3S6k as R3.)
Assigned Action Editor: ~Matthew_Walter1
Submission Number: 4580
Loading