Mitigating Hallucinations in Large Language Models via Hybrid Reinforcement Learning

Mitigating Hallucinations in Large Language Models via Hybrid Reinforcement Learning

ICLR 2026 Conference Submission14138 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Hallucination Mitigation, Reinforcement Learning from Human Feedback, Reinforcement Learning from AI Feedback, Hybrid Reinforcement Learning, Natural Language Processing, Factual Accuracy

TL;DR: We propose a Hybrid Reinforcement Learning framework that dynamically combines human and AI feedback to significantly reduce hallucinations in large language models while maintaining text quality and scalability.

Abstract: Large Language Models (LLMs) have revolutionized natural language processing by producing text that is coherent, contextually relevant, and often indistinguishable from human writing. However, a major challenge persists: hallucinations—outputs that are linguistically fluent but factually inaccurate or irrelevant—pose significant risks in domains requiring high precision, such as healthcare, law, and finance. In this study, we introduce a Hybrid Reinforcement Learning (HRL) framework that strategically combines Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). By harmonizing the reliability of human oversight with the scalability of AI-based evaluation, HRL enhances factual accuracy while maintaining text fluency. Experiments on standard benchmarks, including TruthfulQA and MMLU, demonstrate substantial reductions in hallucination rates and marked improvements in factual correctness compared to prior approaches. This framework provides a robust, scalable pathway toward deploying LLMs more reliably in high-stakes applications.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 14138

Loading