Keywords: Large Language Models, Multi-Agent Systems, Code Generation, Framework Evaluation
Abstract: Multi-agent coding frameworks based on Large Language Models (LLMs) have become a promising approach to improving automated code generation. By simulating collaborative software development teams, these systems coordinate multiple specialized LLM agents, such as coders, testers, and planners, to divide work, check results, and refine code through multiple steps. This survey offers a detailed evaluation of leading multi-agent coding frameworks, including AgentCoder, CodeSIM, CodeCoR, and others, with a focus on their architecture, collaboration methods, and performance on common benchmarks such as HumanEval and MBPP. We identify important design features including role specialization, feedback mechanisms, and structured workflows that separate high-performing systems from weaker ones. Our analysis explores the balance between scalability, speed, and code quality, while also pointing out continuing challenges like communication overhead, reliability, and the ability to adapt to different programming domains. We end by discussing future directions for building multi-agent coding systems that are more efficient, better integrated with tools, and more flexible across tasks. This work aims to support the development of the next generation of AI-assisted programming systems by reviewing the current progress and highlighting practical lessons from recent frameworks.
Submission Number: 8
Loading