AlgoForge: Specializing Code Generation Agents through Collaborative Reinforcement Learning

Zhihao Dou; Qinjian Zhao; Sumon Biswas

AlgoForge: Specializing Code Generation Agents through Collaborative Reinforcement Learning

Zhihao Dou, Qinjian Zhao, Sumon Biswas

20 Sept 2025 (modified: 10 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RL;LLM for code

TL;DR: uses two specialized LLM agents (Planner + Coder) trained with supervised datasets and collaboration-aware reinforcement learning, achieving significantly higher accuracy, efficiency, and robustness in code generation than base models.

Abstract: Large language models (LLMs) have achieved impressive results in code generation across many programming tasks. However, most existing approaches rely on autoregressive decoding without global planning, often yielding locally coherent but globally suboptimal solutions, i.e., code that may fail to pass all test cases or incur unnecessary time or space complexity. Recent efforts, such as Chain-of-Thought (CoT) and multi-agent system (MAS) paradigms, introduce a planning stage, but their limited role specialization and coordination reduce effectiveness on complex tasks. In this work, we present AlgoForge, a collaborative code generation framework that integrates two specialized LLM agents, a Planner and a Coder, to jointly perform plan‑to‑code translation. We first construct two dedicated cold‑start datasets, the Planner Dataset and the Coder Dataset, to inject algorithmic knowledge and instruction‑following skills into each agent via supervised fine‑tuning. Building upon this initialization, we further enhance both agents through a collaboration‑aware reinforcement learning stage based on Gradient‑based Reinforcement Policy Optimization (GRPO), enabling stronger specialization and alignment. We evaluate AlgoForge on four benchmarks of varying difficulty (LiveBench, MBPP, CodeContests, and CodeForces) using three base models (Qwen2.5‑7B‑Instruct, Qwen2.5‑7B‑Coder‑Instruct, and Qwen2.5‑14B‑Coder‑Instruct). AlgoForge consistently outperforms the base models, improving Pass@1 by up to 12.2\% on MBPP and 36.5\% on CodeContests, while also reducing time and space complexity, as well as lowering failure rates and improving runtime efficiency and maintainability. These results demonstrate the effectiveness of combining role specialization with collaborative reinforcement learning for robust LLM‑based code generation.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 24713

Loading