Abstract: Recent advances in large language models (LLMs) have enabled new agentic workflows where multiple LLMs collaborate in specialized roles. Current approaches to designing these workflows face key limitations: manual design requires substantial human expertise, while existing automated frameworks struggle with optimization efficiency and task adaptability. To address these challenges, we present AutoSwarm, a novel system that trains an LLM orchestrator through reinforcement learning to generate executable code. The generated code can be directly executed in a workflow runtime environment, with the orchestrator learning end-to-end through a reward mechanism that optimizes both performance and efficiency. AutoSwarm outperforms existing automated workflow methods, achieving a 1.91\% accuracy improvement on reasoning benchmarks. The system also shows robust generalization, with a 1.25\% performance gain on out-of-distribution tasks. Our work explores a promising direction for learning-based workflow orchestration.
Paper Type: Long
Research Area: Generation
Research Area Keywords: interactive and collaborative generation,text-to-text generation,inference methods
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7171
Loading