TestSmith: Reinforcement Learning for Unit Test Generation with Synthetic Perturbations

Stanley Yu; Roger Jin

TestSmith: Reinforcement Learning for Unit Test Generation with Synthetic Perturbations

Stanley Yu, Roger Jin

Published: 03 Mar 2026, Last Modified: 03 Mar 2026SPOTEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Unit Tests, Code Generation, Synthetic Data, Curriculum Learning, Mutation Testing

Abstract: We present TestSmith, a reinforcement learning approach for training language models to generate unit tests that directly optimize for fault detection. Existing methods rely on coverage proxies or human-written test suites as training signal, but coverage correlates weakly with fault detection and curated test data is scarce, creating a circular dependency between test quality evaluation and test generation. TestSmith breaks this loop by using automatically generated synthetic bugs, both AST-based mutations and LLM-produced semantic perturbations spanning logic errors, boundary mistakes, type errors, and incomplete handling, as an execution- based reward signal, with an explicit penalty for tests that fail on correct code. We train a Qwen3-8B policy with Group Relative Policy Optimization and a three-stage curriculum that progresses from single-function problems with operator changes, to single-function problems with LLM-generated bugs, to multi-file repository tasks, addressing the reward sparsity that otherwise prevents learning in complex settings. On a held-out benchmark of 500 problems, TestSmith raises Pass@5 from 56.50% to 74.19%, achieves a mutation score of 100% (up from 74.99%), and reaches 99.22% line and branch coverage, demonstrating that synthetic bug rewards can effectively align test generation with fault detection.

Submission Number: 97

Loading