# Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

We propose **CURE**, a novel reinforcement learning framework that co-evolves a coder and a unit tester to improve the overall coding ability of large language models.

<p align="center">
  <img src="figures/pipeline.png"  alt="Pipeline of CURE"  width="600">
</p>




## Environment Setup

```bash
conda create --name CURE python=3.10
source activate CURE
pip install torch
pip install -r requirements.txt
pip install --no-cache-dir \
  https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/\
flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
```

You can also install [FlashAttention](https://github.com/Dao-AILab/flash-attention) based on your version of PyTorch and CUDA.


## Dataset

You can find **CodeContests**, **CodeForces**, and **MBPP** in `./evaluation`, and download **LiveBench** and **LiveCodeBench** by running `./evaluation/download_livebench.py` and `./evaluation/download_livecodebench.py`.


## Evaluation

After downloading the dataset, you can perform a comprehensive evaluation using our benchmark, which includes **one-shot coding**, **unit test generation**, and **Best-of-N (BoN) evaluation**, supported by both API-based and vLLM-based inference. Detailed instructions are in `./evaluation`; you only need to modify `./evaluation/evaluation_config.py` then to evaluate your model's coding ability by `python eval.py`.


## CURE Optimization

To start CURE optimization, simply run the following command, which will optimize on the standard base model Qwen2.5-7B-Instruct.

```bash
python run.py
```
You can also modify the configurations in `./optimization/optimization_config.py` to train your model with customized hyperparameters. See details in `./optimization`.












