CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning

Ji Shi; Peiming Guo; Xinming Zhang; Meishan Zhang; Miao Zhang

CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning

Ji Shi, Peiming Guo, Xinming Zhang, Meishan Zhang, Miao Zhang

15 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Unit Test Generation, Large Language Model

Abstract: Code verifier is the key to the post-verification process in large language model (LLM) code generation. However, supervised fine-tuning (SFT) methods suffer from dataset scarcity, high error and failure rates, and severe inference delay. In this work, we adapt reinforcement learning to train an efficient code verifier, CVeDRL, which substantially alleviates these challenges and balances performance and efficiency in only a 0.6B scale. First, we design syntax and functionality rewards and employ GRPO to train the base code verifier. However, preliminary experiments indicated that the base model could not produce effective unit tests for difficult branches and samples. Then we propose Branch-Difficulty-aware and Sample-Difficulty-aware reinforcement learning based on exponential reward shaping and static analysis metrics (Halstead Complexity and Maintainability Index). Experimental results show that CVeDRL significantly outperforms the vanilla model while remaining competitive with state-of-the-art models such as GPT-4o-mini and GPT-3.5 in pass rate, assertion failure rate, and code coverage, etc. Furthermore, CVeDRL-0.6B improves inference efficiency by more than 20x compared with LLM trained with SFT method. Code is available at https://anonymous.4open.science/r/CVeDRL-DF1A/

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6002

Loading