AutoSwarm: Training LLM to Self-orchestrate via Reinforcement Learning

AutoSwarm: Training LLM to Self-orchestrate via Reinforcement Learning

ACL ARR 2025 February Submission7171 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in large language models (LLMs) have enabled new agentic workflows where multiple LLMs collaborate in specialized roles. Current approaches to designing these workflows face key limitations: manual design requires substantial human expertise, while existing automated frameworks struggle with optimization efficiency and task adaptability. To address these challenges, we present AutoSwarm, a novel system that trains an LLM orchestrator through reinforcement learning to generate executable code. The generated code can be directly executed in a workflow runtime environment, with the orchestrator learning end-to-end through a reward mechanism that optimizes both performance and efficiency. AutoSwarm outperforms existing automated workflow methods, achieving a 1.91\% accuracy improvement on reasoning benchmarks. The system also shows robust generalization, with a 1.25\% performance gain on out-of-distribution tasks. Our work explores a promising direction for learning-based workflow orchestration.

Paper Type: Long

Research Area: Generation

Research Area Keywords: interactive and collaborative generation,text-to-text generation,inference methods

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 7171

Loading