LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models
Abstract: Large language models (LLMs) have demonstrated exceptional capabilities in natural language processing tasks but often fall short in maintaining factual accuracy, particularly in knowledge-intensive domains like healthcare. This study introduces LEAF: Learning and Evaluation Augmented by Fact-Checking, a novel framework aimed at improving the factual reliability of LLMs in medical question answering (QA). LEAF comprises three key contributions: (1) the Retrieval-Augmented Factuality Evaluator (RAFE), a robust fact-checking system using open-source LLMs and domain-specific retrieval corpora to evaluate response accuracy; (2) Fact-Check-then-RAG, an enhanced Retrieval-Augmented Generation method that incorporates fact-checking to guide retrieval without requiring parameter updates; and (3) Learning from Fact Check via Self-Training, a strategy to improve LLM performance through supervised fine-tuning or preference-based learning, using fact-checking results as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. These findings suggest LEAF as a scalable and robust solution for low-resource settings.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: fact checking, healthcare applications, clinical NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 150
Loading