Can We Catch the Elephant? A Survey of the Automatic Hallucination Evaluation on Natural Language Generation

Can We Catch the Elephant? A Survey of the Automatic Hallucination Evaluation on Natural Language Generation

ACL ARR 2024 December Submission912 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Hallucination in Natural Language Generation (NLG) presents a significant challenge, often underestimated despite recent advances in model fluency and grammatical correctness. As text generation systems evolve, hallucination evaluation has become increasingly critical, yet current methodologies remain complex and varied, lacking clear organization. In this paper, we conduct a comprehensive survey on Automatic Hallucination Evaluation (AHE) techniques. We systematically categorize existing approaches based on the proposed evaluation pipeline: datasets and benchmarks, evidence collection, and comparison mechanisms. Our work aims to clarify these diverse approaches, highlighting limitations and suggesting avenues for future research to improve the reliability and safety of NLG models.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: evaluation, evaluation methodologies

Contribution Types: Surveys

Languages Studied: English, Chinese, Multilingual

Submission Number: 912

Loading