Keywords: multi-agent reinforcement learning, large language model, curriculum learning, sub-task decomposition
Abstract: Sub-task curriculum learning has shown promise in cooperative multi-agent reinforcement learning (MARL), especially under sparse rewards. However, existing approaches often rely on expert-designed templates or end-to-end learning, limiting generalizability and efficiency. To address these limitations, We propose STAR-MARL (Sub-task Tree with Assisted Rewards), a fully automated framework that integrates large language models (LLMs) with the training dynamics of MARL agents. STAR-MARL uses Chain-of-Thought prompting and few-shot learning to generate a hierarchical, interpretable sub-task tree, with each node containing executable training scenarios and curriculum reward functions. However, a key challenge in MARL curriculum design lies in evaluating qualities of substasks, as online MARL training rollouts are computationally expensive and unstable. To address this, we introduce a retrieval-augmented generation (RAG)-based sub-curriculum evaluator that leverages MARL training trajectories to estimate potential policy improvement of reward functions without further environment interaction. Built atop a memory of historical sub-task trajectories, the evaluator enables offline curriculum evaluation and rapid curriculum refinement, making curriculum learning more sample-efficient and scalable. We apply the STAR-MARL framework to the Cooking Zoo and Google Research Football environments, generating interpretable curricula tasks of varying complexity. Our research paves the way for constructing interpretable, low-cost, and generalizable LLM-driven curricula for MARL.
Submission Number: 24
Loading