Keywords: Reinforcement Learning, Unit Test Generation, Large Language Model
Abstract: Code verifier is the key to the post-verification process in large language model (LLM) code generation.
However, supervised fine-tuning (SFT) methods suffer from dataset scarcity, high error and failure rates, and severe inference delay.
In this work, we adapt reinforcement learning to train an efficient code verifier, CVeDRL, which substantially alleviates these challenges and balances performance and efficiency in only a 0.6B scale.
First, we design syntax and functionality rewards and employ GRPO to train the base code verifier.
However, preliminary experiments indicated that the base model could not produce effective unit tests for difficult branches and samples.
Then we propose Branch-Difficulty-aware and Sample-Difficulty-aware reinforcement learning based on exponential reward shaping and static analysis metrics (Halstead
Complexity and Maintainability Index).
Experimental results show that CVeDRL significantly outperforms the vanilla model while remaining competitive with state-of-the-art models such as GPT-4o-mini and GPT-3.5 in pass rate, assertion failure rate, and code coverage, etc.
Furthermore, CVeDRL-0.6B improves inference efficiency by more than 20x compared with LLM trained with SFT method.
Code is available at https://anonymous.4open.science/r/CVeDRL-DF1A/
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6002
Loading