LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models
Large language models (LLMs) have demonstrated exceptional capabilities in natural language processing tasks but often fall short in maintaining factual accuracy, particularly in knowledge-intensive domains like healthcare. This study introduces LEAF: Learning and Evaluation Augmented by Fact-Checking, a novel framework aimed at improving the factual reliability of LLMs in medical question answering (QA). LEAF comprises three key contributions: (1) the Retrieval-Augmented Factuality Evaluator (RAFE), a robust fact-checking system using open-source LLMs and domain-specific retrieval corpora to evaluate response accuracy; (2) Fact-Check-then-RAG, an enhanced Retrieval-Augmented Generation method that incorporates fact-checking to guide retrieval without requiring parameter updates; and (3) Learning from Fact Check via Self-Training, a strategy to improve LLM performance through supervised fine-tuning or preference-based learning, using fact-checking results as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. These findings suggest LEAF as a scalable and robust solution for low-resource settings.