EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

Yang Cheng; Zilai Wang; Weiyu Ma; Wenhui Zhu; Mohamed Elhoseiny; Yue DENG; Jian Zhao

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

Yang Cheng, Zilai Wang, Weiyu Ma, Wenhui Zhu, Mohamed Elhoseiny, Yue DENG, Jian Zhao

12 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agents, Complex Task, Behavior Code, Self-evolve

Abstract: While large language models (LLMs) demonstrate remarkable capabilities across diverse domains, they fail catastrophically on high-complexity tasks requiring long-horizon reasoning and multi-step coordination. To address this problem, we present EvoCurr, a self-evolving curriculum learning framework that enables LLMs to solve complex decision-making problems through cooperative multi-agent learning. The core of EvoCurr is a multi-agent cooperative system where a Designer agent generates adaptive task sequences and a Solver agent produces executable solutions through coordinated interaction. Both agents share identical rewards based on task performance and proximity to the target task, creating a fully cooperative framework that naturally aligns their objectives for progressive skill acquisition. A critical innovation is the accepted-floor constraint that prevents difficulty regression below previously solved levels, ensuring monotonic skill advancement while preventing catastrophic forgetting. The framework enforces feasibility through a validation gate and supports both open-loop code generation and closed-loop policy learning paradigms. We evaluate EvoCurr on two complementary domains: StarCraft II micro-management and Overcooked coordination tasks. On StarCraft II micro-management, where the Solver generates Python behavior-tree scripts for complex tactical scenarios, EvoCurr achieves average combat winning rates above 90\% while state-of-the-art models achieve less than 50\% when directly attempting these scenarios. On Overcooked coordination tasks, where the Solver uses multi-agent reinforcement learning to train cooperative policies, EvoCurr achieves 20\% higher task completion rates (measured by dish orders delivered) compared to direct training. Our results demonstrate that EvoCurr provides a principled, domain-agnostic approach for extending LLM capabilities to complex decision-making tasks previously beyond their reach.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 4390

Loading