LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Evolutionary Reinforcement Learning
Abstract: The integration of evolutionary algorithms (EAs) with reinforcement learning (RL) has shown superior performance compared to standalone methods. However, previous research focuses on exploration in policy parameter space, while overlooking the reward function search. To bridge this gap, we propose **LaRes**, a novel hybrid framework that achieves efficient policy learning through reward function search. LaRes leverages large language models (LLMs) to generate the reward function population, guiding RL in policy learning. The reward functions are evaluated by the policy performance and improved through LLMs. To improve sample efficiency, LaRes employs a shared experience buffer that collects experiences from all policies, with each experience containing rewards from all reward functions. Upon reward function updates, the rewards of experiences are relabeled, enabling efficient use of historical data. Furthermore, we introduce a Thompson sampling-based selection mechanism that enables more efficient elite interaction. To prevent policy collapse when improving reward functions, we propose the reward scaling and parameter constraint mechanisms to efficiently coordinate reward search with policy learning. Across both initialized and non-initialized settings, LaRes consistently achieves state-of-the-art performance, outperforming strong baselines in both sample efficiency and final performance. The code is available at https://github.com/yeshenpy/LaRes.
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 27186
Loading