LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

ACL ARR 2024 December Submission150 Authors

10 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) have demonstrated exceptional capabilities in natural language processing tasks but often fall short in maintaining factual accuracy, particularly in knowledge-intensive domains like healthcare. This study introduces LEAF: Learning and Evaluation Augmented by Fact-Checking, a novel framework aimed at improving the factual reliability of LLMs in medical question answering (QA). LEAF comprises three key contributions: (1) the Retrieval-Augmented Factuality Evaluator (RAFE), a robust fact-checking system using open-source LLMs and domain-specific retrieval corpora to evaluate response accuracy; (2) Fact-Check-then-RAG, an enhanced Retrieval-Augmented Generation method that incorporates fact-checking to guide retrieval without requiring parameter updates; and (3) Learning from Fact Check via Self-Training, a strategy to improve LLM performance through supervised fine-tuning or preference-based learning, using fact-checking results as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. These findings suggest LEAF as a scalable and robust solution for low-resource settings.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: fact checking, healthcare applications, clinical NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

Submission Number: 150

Loading