Can We Catch the Elephant? A Survey of the Automatic Hallucination Evaluation on Natural Language Generation
Abstract: Hallucination in Natural Language Generation (NLG) presents a significant challenge, often underestimated despite recent advances in model fluency and grammatical correctness. As text generation systems evolve, hallucination evaluation has become increasingly critical, yet current methodologies remain complex and varied, lacking clear organization. In this paper, we conduct a comprehensive survey on Automatic Hallucination Evaluation (AHE) techniques. We systematically categorize existing approaches based on the proposed evaluation pipeline: datasets and benchmarks, evidence collection, and comparison mechanisms. Our work aims to clarify these diverse approaches, highlighting limitations and suggesting avenues for future research to improve the reliability and safety of NLG models.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: evaluation, evaluation methodologies
Contribution Types: Surveys
Languages Studied: English, Chinese, Multilingual
Submission Number: 912
Loading