Can We Catch the Elephant? A Survey of the Automatic Hallucination Evaluation on Natural Language Generation

ACL ARR 2024 December Submission912 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Hallucination in Natural Language Generation (NLG) presents a significant challenge, often underestimated despite recent advances in model fluency and grammatical correctness. As text generation systems evolve, hallucination evaluation has become increasingly critical, yet current methodologies remain complex and varied, lacking clear organization. In this paper, we conduct a comprehensive survey on Automatic Hallucination Evaluation (AHE) techniques. We systematically categorize existing approaches based on the proposed evaluation pipeline: datasets and benchmarks, evidence collection, and comparison mechanisms. Our work aims to clarify these diverse approaches, highlighting limitations and suggesting avenues for future research to improve the reliability and safety of NLG models.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: evaluation, evaluation methodologies
Contribution Types: Surveys
Languages Studied: English, Chinese, Multilingual
Submission Number: 912
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview