Adversarial Test Case Generation via Reinforcement Learning Extends Scaling Laws

Jiacheng Xu; Wentao Zhang; Zhiyi Lyu; Fuxiang Zhang; Chaojie Wang; Yang Liu; Bo An

Adversarial Test Case Generation via Reinforcement Learning Extends Scaling Laws

Jiacheng Xu, Wentao Zhang, Zhiyi Lyu, Fuxiang Zhang, Chaojie Wang, Yang Liu, Bo An

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Large Language Model

Abstract: Rule-based reinforcement learning (RL) has greatly advanced the coding capabilities of large language models (LLMs). However, existing RL methods remain largely confined to code generation, relying on fixed test cases for evaluation and leaving the problem of test case generation underexplored. Generating diverse and adversarial test cases is critical, as it not only enriches coding knowledge but also enables effective self-verification during inference. Recent supervised learning approaches attempt to jointly train code and test case generation during the post-training stage. Yet, these methods fall short: supervised learning inherently lags behind RL in coding performance, and the resulting test cases often lack diversity and adversarial quality, limiting their ability to identify erroneous code. To address these limitations while retaining the advantages of RL, we propose Test Cases Scaling (TCS), a two-stage reinforcement learning framework for learning to generate high-quality adversarial test cases. TCS employs stage-specific reward functions and a policy-aligned training buffer to progressively enhance test case quality and alignment with the evolving model. Experimental results on TACO and LiveCodeBench show that TCS significantly outperforms supervised baselines in both code and test case generation during both training and inference. Furthermore, adversarial test cases generated by our trained TCS-7B model improve the inference-time performance of leading proprietary LLMs.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 12230

Loading