ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization

ACL ARR 2025 July Submission219 Authors

25 Jul 2025 (modified: 04 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in reinforcement learning from human feedback (RLHF) and autoregressive transformers have driven the evolution of large language models such as GPT-4.0, DeepSeek R1, and Llama 3.3, enabling richer and more personalized responses. However, prevailing RLHF paradigms—from Proximal Policy Optimization (PPO) to Direct Preference Optimization (DPO)—still rely on binary‐preference labels. While this approach reduces immediate annotation costs, it demands extensive human labeling and captures only coarse, group‐level tastes. It suffers from high annotation overhead and limited adaptability to individual users. To address these limitations, we introduce Adaptive Reward‐Following (ARF), a self‐assessment framework that converts free‐form user feedback into continuous preference signals using a high-precision satisfaction scorer (70\% accuracy on GoEmotions, Sentiment140, and DailyDialog). We further refine and debias these signals through data augmentations—synonym replacement, trace truncation, and score bias annotation—and use a Dynamic Adapter Preference Tracker to model evolving user tendencies in real time. Building on these components, our novel Trace Bias (TB) fine-tuning algorithm optimizes continuous reward trajectories rather than binary labels. Experiments on Qwen-2/2.5, Gemma-2, and Llama-3.2 across four preference domains show that ARF outperforms PPO by 3.3\% and DPO by 7.6\%, while maintaining alignment with RLHF objectives. ARF delivers a scalable, personalized, and cost-effective paradigm for next-generation RLHF in large language models.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: reinforcement learning, low-resource learning, efficient optimization, self-supervised learning

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Theory

Languages Studied: English

Submission Number: 219

Loading