Code in Harmony: Evaluating Multi-Agent Frameworks

UIUC Spring 2025 CS598 LLM Agent Workshop Submission8 Authors

17 Apr 2025 (modified: 18 Apr 2025)UIUC Spring 2025 CS598 LLM Agent Workshop SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Multi-Agent Systems, Code Generation, Framework Evaluation
Abstract: Multi-agent coding frameworks based on Large Language Models (LLMs) have become a promising approach to improving automated code generation. By simulating collaborative software development teams, these systems coordinate multiple specialized LLM agents, such as coders, testers, and planners, to divide work, check results, and refine code through multiple steps. This survey offers a detailed evaluation of leading multi-agent coding frameworks, including AgentCoder, CodeSIM, CodeCoR, and others, with a focus on their architecture, collaboration methods, and performance on common benchmarks such as HumanEval and MBPP. We identify important design features including role specialization, feedback mechanisms, and structured workflows that separate high-performing systems from weaker ones. Our analysis explores the balance between scalability, speed, and code quality, while also pointing out continuing challenges like communication overhead, reliability, and the ability to adapt to different programming domains. We end by discussing future directions for building multi-agent coding systems that are more efficient, better integrated with tools, and more flexible across tasks. This work aims to support the development of the next generation of AI-assisted programming systems by reviewing the current progress and highlighting practical lessons from recent frameworks.
Submission Number: 8
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview