HARPO: Hallucination-Aware Reinforcement Learning for Faithful and Creative Language Generation

HARPO: Hallucination-Aware Reinforcement Learning for Faithful and Creative Language Generation

ACL ARR 2026 January Submission4640 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hallucination mitigation, reinforcement learning, large language models

Abstract: Large Language Models (LLMs) are prone to generating hallucinated content, which compromises their reliability in knowledge-intensive tasks. To address this challenge without sacrificing creativity, we propose HARPO, a reinforcement learning framework designed to jointly optimize faithfulness and creativity. HARPO incorporates a Generative Reward Model (GRM), trained via verifiable feedback, to simultaneously assess faithful adherence and writing quality. Crucially, we employ a Selective Activation Mechanism (SAM) that acts as a conditional gate, incentivizing creativity only when outputs are hallucination-free. To further stabilize training, we implement a curriculum learning scheme that progressively shifts from creative writing data to hallucination-centric samples. Extensive experiments demonstrate that HARPO significantly improves faithfulness while preserving expressiveness, outperforming strong baselines.

Paper Type: Long

Research Area: Natural Language Generation

Research Area Keywords: post-training,hallucination,reinforcement learning,large language models,generatio

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 4640

Loading