Keywords: Reinforcement Learning, Unit Tests, Code Generation, Synthetic Data, Curriculum Learning, Mutation Testing
Abstract: We present TestSmith, a reinforcement learning approach for training language
models to generate unit tests that directly optimize for fault detection. Existing
methods rely on coverage proxies or human-written test suites as training signal,
but coverage correlates weakly with fault detection and curated test data is scarce,
creating a circular dependency between test quality evaluation and test generation.
TestSmith breaks this loop by using automatically generated synthetic bugs, both
AST-based mutations and LLM-produced semantic perturbations spanning logic
errors, boundary mistakes, type errors, and incomplete handling, as an execution-
based reward signal, with an explicit penalty for tests that fail on correct code. We
train a Qwen3-8B policy with Group Relative Policy Optimization and a three-stage
curriculum that progresses from single-function problems with operator changes, to
single-function problems with LLM-generated bugs, to multi-file repository tasks,
addressing the reward sparsity that otherwise prevents learning in complex settings.
On a held-out benchmark of 500 problems, TestSmith raises Pass@5 from 56.50%
to 74.19%, achieves a mutation score of 100% (up from 74.99%), and reaches
99.22% line and branch coverage, demonstrating that synthetic bug rewards can
effectively align test generation with fault detection.
Submission Number: 97
Loading