Forging Better Rewards: A Multi-Agent LLM Framework for Automated Reward Evolution

Forging Better Rewards: A Multi-Agent LLM Framework for Automated Reward Evolution

ICLR 2026 Conference Submission16594 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM-based Agents, reinforcement learning, autonomy, application to robotics, gaming

TL;DR: FORGE is a multi-agent LLM framework that improves reward design via structured initialization, evolutionary refinement, and complexity-aware memory, achieving strong zero-shot performance, stable evolution, and up to 38.5% gains on Humanoid.

Abstract: Large Language Models (LLMs) have shown increased autonomy in performing complex tasks, but the inference latency and fine-tuning cost impose significant limitations for their application in dynamic, real-time environments such as robotics and gaming. Reinforcement learning (RL), by contrast, offers efficient execution and has shown strong results in diverse domains. Yet its progress is often bottlenecked by the challenge of designing effective reward functions, which are typically sparse and require heavy manual effort to engineer. Recent work has explored LLM-based reward generation, reducing manual effort yet remaining unstable, unstructured, and opaque. Building on the enhanced reasoning capabilities of modern LLMs, we advance this line of research toward full automation by introducing structured reward initialization, evolutionary refinement, and explicit complexity modeling. These innovations reduce reliance on manual trial-and-error while enabling more stable, interpretable, and scalable reward design. We unify them into FORGE (Feedback-Optimized Reward Generation and Evolution), a multi-agent framework that automatically forges increasingly effective reward functions. Extensive experiments across three games and a robotics task demonstrate the effectiveness of FORGE, achieving up to 38.5% improvement over Eureka and 19.0% over REvolve in the Humanoid task, while maintaining competitive token efficiency.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 16594

Loading