Reinforcement Learning from Human Text Feedback: Learning a Reward Model from Human Text Input

Published: 17 Jun 2024, Last Modified: 02 Jul 2024ICML 2024 Workshop MHFAIA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement learning, human feedback, large language models, reward models
Abstract: We explore the use of human-generated text inputs to model rewards in Reinforcement Learning with Human Feedback (RLHF). Human text contains rich and nuanced information, yet most previous work relies on preference feedback or restricts the text structure. We propose using Large Language Models (LLMs) as a way of harnessing the information from natural text to train a reward model efficiently. Our empirical evaluations demonstrate the advantages of this approach in both tabular and continuous reinforcement learning tasks. The results show that even with minimal human interactions, integrating text feedback with LLMs enables our method to approximate the reward function accurately, leading to significant performance improvements.
Submission Number: 63
Loading