Keywords: Automated Program Repair, Large Language Models, Multi-Agent, LLM-Based APR, Open-Source LLMs
Abstract: Automated Program Repair (APR) has recently advanced through the adoption of Large Language Models (LLMs). However, state-of-the-art performance typically relies on large-scale proprietary models (e.g., GPT-3.5/4 with 175B+ parameters), limiting accessibility, reproducibility, and cost-efficiency. We present ChainRepair, an autonomous multi-agent APR framework built upon a 7B-parameter open-source model. ChainRepair coordinates five specialized agents through structured chain prompting to enable decomposed reasoning without task-specific fine-tuning. The framework systematically incorporates dynamic execution feedback, grounding repair decisions in empirical evidence rather than static pattern matching.
We evaluate ChainRepair on the QuixBugs benchmark, where it repairs 82.5% of defects (33/40) while generating only three candidate patches per bug. Compared to proprietary model baselines, our approach achieves a 25x reduction in model size and a 40x improvement in sample efficiency. These results demonstrate that architectural decomposition and evidence-driven reasoning can substantially mitigate the limitations of smaller open-source models. Our findings highlight a practical and reproducible pathway toward accessible, high-performance LLM-based APR.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public.
Paper Type: Full-length papers (i.e. case studies, theoretical, applied research papers). 8 pages
Reroute: false
Submission Number: 26
Loading