Adversarial Iterative Unit Test Generation with Large Language Models

Published: 02 Mar 2026, Last Modified: 30 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unit test generation, Large language model
Abstract: Unit testing is a critical practice in software development, aimed at detecting bugs and ensuring the robustness of individual program components. However, writing comprehensive unit tests is challenging, as it requires a deep understanding of the source code and thorough coverage of edge cases. In this work, we propose ADIT (ADversarial Iterative unit Test generation), a novel framework that generates high-quality unit tests through an adversarial process between two large language models (LLMs). Specifically, ADIT operates in two iterative steps: (1) an attacker LLM injects subtle bugs that evade existing tests, and (2) a defender LLM refines the test suite in a feedback loop until these bugs are detected. To evaluate ADIT in a realistic setting, we introduce two benchmarks: ClassTestEval, based on class-level Python source code, and CuPyTestEval, derived from the CuPyrepository. Our results demonstrate that unit tests generated by ADIT outperform existing baselines in bug detection, highlighting the potential of adversarial LLM-based approaches for robust test generation.
Submission Number: 13
Loading