LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

ACL ARR 2024 December Submission150 Authors

10 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Large language models (LLMs) have demonstrated exceptional capabilities in natural language processing tasks but often fall short in maintaining factual accuracy, particularly in knowledge-intensive domains like healthcare. This study introduces LEAF: Learning and Evaluation Augmented by Fact-Checking, a novel framework aimed at improving the factual reliability of LLMs in medical question answering (QA). LEAF comprises three key contributions: (1) the Retrieval-Augmented Factuality Evaluator (RAFE), a robust fact-checking system using open-source LLMs and domain-specific retrieval corpora to evaluate response accuracy; (2) Fact-Check-then-RAG, an enhanced Retrieval-Augmented Generation method that incorporates fact-checking to guide retrieval without requiring parameter updates; and (3) Learning from Fact Check via Self-Training, a strategy to improve LLM performance through supervised fine-tuning or preference-based learning, using fact-checking results as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. These findings suggest LEAF as a scalable and robust solution for low-resource settings.

Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: fact checking, healthcare applications, clinical NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 150
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview