Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge
Abstract: While large language models (LLMs) have shown remarkable capabilities to generate coherent text, they suffer from the issue of hallucinations -- factual inaccuracies. Self-correcting systems are especially promising for tackling hallucinations. They leverage the multi-turn nature of LLMs to iteratively generate verification questions inquiring additional evidence, answer them with internal or external knowledge, and use that to refine the original response with the new corrections. These methods have been explored for encyclopedic generation, but less so for domains like news summaries. In this work, we investigate two state-of-the-art self-correcting systems: apply them to hallucinated summaries, using three search engines, and evaluate. We analyze the results and provide qualitative insights into systems' performance, revealing interesting practical findings on G-Eval and human evaluation, and the benefits of search snippets and few-shot prompts.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, automatic evaluation of datasets, evaluation, metrics
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 3952
Loading