Adaptive Prompts for Efficient RLHF

ACL ARR 2024 June Submission5261 Authors

16 Jun 2024 (modified: 05 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The alignment problem, ensuring AI systems adhere to human values, remains a significant challenge despite the collection of increasingly high-quality and expensive datasets. Reinforcement Learning from Human Feedback (RLHF) offers a promising solution, leveraging human judgment during training. However, standard RLHF often relies on static prompts, potentially wasting resources and neglecting areas needing improvement. This work proposes a novel approach for efficient and effective RLHF fine-tuning of large language models (LLMs). We introduce a dynamic prompt generation system that adapts based on the model's intermediate performance. This allows the model to focus on areas requiring the most human guidance, leading to faster and more targeted alignment. We evaluate our method by comparing three models trained with the same resources: a standard RLHF baseline, a Starts-On-Policy (SOP) model with static prompts based on initial performance, and our Always-On-Policy (AOP) model with dynamically generated prompts. Results demonstrate that AOP significantly outperforms all other models showcasing the effectiveness of our approach.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP, human-centered NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 5261
Loading