Keywords: hallucination mitigation, reinforcement learning, large language models
Abstract: Large Language Models (LLMs) are prone to generating hallucinated content, which compromises their reliability in knowledge-intensive tasks. To address this challenge without sacrificing creativity, we propose HARPO, a reinforcement learning framework designed to jointly optimize faithfulness and creativity. HARPO incorporates a Generative Reward Model (GRM), trained via verifiable feedback, to simultaneously assess faithful adherence and writing quality. Crucially, we employ a Selective Activation Mechanism (SAM) that acts as a conditional gate, incentivizing creativity only when outputs are hallucination-free. To further stabilize training, we implement a curriculum learning scheme that progressively shifts from creative writing data to hallucination-centric samples. Extensive experiments demonstrate that HARPO significantly improves faithfulness while preserving expressiveness, outperforming strong baselines.
Paper Type: Long
Research Area: Natural Language Generation
Research Area Keywords: post-training,hallucination,reinforcement learning,large language models,generatio
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4640
Loading