Keywords: Code Generation, Test Generation, Self-learning
TL;DR: We propose a self-play framework where a language model improves its coding and testing skills by having them compete and learn from each other.
Abstract: Recent breakthroughs in Large Language Models (LLMs) have significantly advanced code generation. However, further progress is increasingly constrained by the limited availability of high-quality supervised data. Synthetic data generation via self-instruction shows potential, but naive approaches often suffer from error accumulation and generalization collapse, underscoring the critical need for robust quality control. This paper introduces Sol-Ver, a novel self-play framework where an LLM simultaneously acts as a solver (generating code) and a verifier (generating tests). These two capabilities are mutually enhanced: improved tests lead to better code, which in turn enables the generation of more discerning tests. Sol-Ver iteratively refines both code solutions and their corresponding unit tests, jointly improving both functionalities without requiring human annotations or larger, more capable teacher models. Our experiments using Llama 3.1 8B demonstrate substantial gains, achieving average relative improvements of 19.63% in code generation (pass@1) and 17.49% in test generation accuracy on the MBPP and LiveCodeBench benchmarks.
Submission Number: 61
Loading