RLKGF: Reinforcement Learning from Knowledge Graph Feedback Without Human Annotations

ACL ARR 2025 February Submission6037 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Reinforcement Learning from Human Feedback (RLHF) has been shown to effectively aligns large language models (LLMs) with human knowledge. However, the requirement for human preference labels remains a significant bottleneck when applying RLHF to a down-stream domain. Existing evaluations of LLMs primarily focus on the semantic relevance between questions and responses, as well as the accuracy of the reasoning paths, which align with the implicit semantics and explicit structural links in knowledge graphs (KGs). Inspired by this observation, we propose Reinforcement Learning from Knowledge Graph Feedback (RLKGF), a novel method that leverages KG semantics and structure to derive RL rewards in the absence of manual annotations. Unlike Reinforcement Learning from AI Feedback (RLAIF), RLKGF directly integrates human priors encoded in KGs as the reward model, aligning LLM responses with expert knowledge without additional preference labeling or reward model training. RLKGF structures context-relevant facts into knowledge subgraphs and defines rewards by simulating information flow across semantic and logical connections between question and candidate response entities. Experiments on three public and one private medical dialogue dataset demonstrate that RLKGF significantly outperforms the competitive RLAIF in improving LLM diagnostic accuracy, which highlight the effectiveness of KG-based reward feedback for LLM knowledge alignment. Code will be available.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: healthcare applications, clinical NLP, knowledge graphs
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: Chinese
Submission Number: 6037
Loading