Keywords: Code Generation, Agent, LLM, Test-Driven Development
Abstract: Large Language Models (LLMs) excel at code generation, yet ensuring the functional correctness of their outputs remains a persistent challenge.
While recent studies have applied Test-Driven Development (TDD) to refine code, these methods are fundamentally undermined by poor feedback quality, stemming from the scarcity of high-quality test cases and noisy signals from auto-generated ones.
In this work, we shift the focus from test quantity to feedback quality.
We introduce the Property-Generated Solver (PGS), a novel paradigm designed to generate highly effective feedback by adhering to two principles:
it must be property-oriented, to provide semantic guidance beyond simple I/O mismatches, and structurally minimal, to reduce cognitive load and isolate the error's root cause.
PGS operates by checking high-level program properties (e.g., a sorting function must produce a non-decreasing sequence) and then providing the simplest failing counterexample to the LLM.
This property-driven, minimal feedback steers LLMs toward more correct and generalizable solutions.
Across a diverse suite of programming benchmarks, PGS consistently demonstrates a superior corrective power, achieving a bug fix rate 1.4x-1.6x higher than the strongest debugging-based approaches and establishing a new state-of-the-art in automated code refinement.
The source code and data are available in the supplementary.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8694
Loading