ICE-Coder: Integrating White-box and Black-box Testing in Execution-guided Multi-agent Code Generation

Jed Koh Jin Keat; LI YUSU; Tianyi Zhang; BINGZHENG GAN; Yangkai Ding

ICE-Coder: Integrating White-box and Black-box Testing in Execution-guided Multi-agent Code Generation

Jed Koh Jin Keat, LI YUSU, Tianyi Zhang, BINGZHENG GAN, Yangkai Ding

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: code generation, test generation, multi-agent planning

TL;DR: We layer multi-agent LLM coding with white-box test generation (inspired by coverage-guided testing and code reviews), standard black-box tests, and LLM deliberation on outputs, raising LiveCodeBench‑Hard solves from 55/90 to 72/90.

Abstract: LLM-based coding agents are programs that utilise LLMs to automate code generation tasks. Typically, they incorporate code execution capabilities which, together with automated test generation and/or debugging methods, enhance the reliability of the generated code. However, the effectiveness of these approaches remains limited in complex problems (such as competitive programming problems) where bugs surface only in convoluted edge cases. This work builds upon multi-agent code generation techniques which emulate software engineering environments. In particular, to address obscure edge cases, we take inspiration from code coverage tools and code reviews to generate white-box tests, on top of existing black-box test generation approaches. Test case outputs are validated through a process of deliberation using the LLM. By increasing the quantity and quality of the test cases, we obtain more reliable generated code. We evaluated ICE-Coder on LiveCodeBench-Hard. Out of the 90 problems, it solves 72, compared to the baseline of 55.

Primary Area: applications to robotics, autonomy, planning

Supplementary Material: zip

Submission Number: 7259

Loading