MAGIC: Multi-Armed Bandit Guided Iterative Code Generation

ACL ARR 2025 May Submission7409 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have shown remarkable capabilities in code generation, yet they often struggle with solution diversity and competition-level problems. In this paper, we introduce **MAGIC** (**M**ulti-**A**rmed bandit **G**uided **I**terative **C**ode generator), an approach that formalizes plan selection in LLM-based code generation as a Multi-Armed Bandit (MAB) problem, enabling systematic exploration of diverse solution strategies. The method disentangles the generation process into three phases: explicit plan generation, code implementation, and code refinement. By treating each potential plan as an arm in the MAB framework, we employ an adapted Upper Confidence Bound (UCB) algorithm that balances the exploration of different solution strategies with the exploitation of promising plans. With the purpose of constraining code refinement to current plans to ensure focused solution space exploitation, we propose to formalize the plans as code skeletons. Experiments on HumanEval, HumanEval+, CodeContest, and APPS demonstrate significant improvements over existing methods, with pass@1 up to 97.0% on HumanEval and 45.5% on CodeContest using GPT-4o. Through variance-based diversity metrics, we show that MAGIC substantially increases solution diversity, particularly benefiting performance on challenging competitive programming tasks.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: code generation, large language models, multi-armed bandit, planning, agent
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7409
Loading