Keywords: evolutionary agent; inference-time search; unit testing
TL;DR: evolutionary agent for unit test writing
Abstract: We present **UT-Evolve**, an evolutionary agent that improves the unit test generation capability of large language models through iterative interaction with a testing environment. UT-Evolve orchestrates a stateful agent loop that refines unit tests over multiple iterations using feedback from prior executions, including failures and coverage signals. In contrast to most existing coding agents, which rely on stateless, single-step inference, UT-Evolve maintains persistent state across generations, enabling long-horizon reasoning and adaptive test generation strategies.
We argue that the absence of persistent state fundamentally limits LLM performance on tasks such as unit test generation, where effective solutions require the accumulation of task-specific knowledge and the adaptation of dynamic strategies over time. Rather than producing predominantly surface-level happy-path tests, UT-Evolve prioritizes edge cases and systematically probes latent assumptions, using observed failures to evolve its test generation policy.
In a **preliminary** evaluation on a filtered version of TestGenEval (TestGenEvalMini), UT-Evolve achieves improved test coverage compared to single-step stateless baselines across multiple LLM families. These results suggest that UT-Evolve offers a promising direction for enabling generative models to learn effective behaviors through sustained interaction with complex environments.
Submission Number: 77
Loading