# Artifact of EvalPlus

## Requirements and setups

- Python 3.8+
- NVIDIA GPU

To get started, first run:

```shell
pip install -r requirements.txt
pip install -r requirements-tools.txt
pip install -r requirements-llm.txt
export PYTHONPATH=$PYTHONPATH:$(pwd)
```

## Generate LLM-synthesized code

Generating LLM samples can be costly, time-consuming and complicated so we prepared all pre-generated samples for you to play with!

To extract pre-generated samples:

```shell
# install 7z to unzip the pre-generated samples
sudo apt install p7zip-full p7zip-rar
7z x samples.7z
```

But you can always re-generate them through the following command. Taking `santacoder` as an example:

```shell
python codegen/generate.py --model santacoder --root ./pregen --bs 1  --temperature 0.0 --n_samples 1 --resume --greedy
python codegen/generate.py --model santacoder --root ./pregen --bs 20 --temperature 0.2 --n_samples 200 --resume
python codegen/generate.py --model santacoder --root ./pregen --bs 20 --temperature 0.4 --n_samples 200 --resume
python codegen/generate.py --model santacoder --root ./pregen --bs 20 --temperature 0.6 --n_samples 200 --resume
python codegen/generate.py --model santacoder --root ./pregen --bs 20 --temperature 0.8 --n_samples 200 --resume
```

## Run Evaluation

```shell
./evo.sh
# It can take a few hours to run all samples.
```

## Test-case Reduction

> **Note** EvalPlus uses mutmut to generate mutants. According to the [doc](https://mutmut.readthedocs.io/en/latest/#:~:text=support%20python%203.4%2C%203.5%20and%203.6), mutmut can only support Python 3.6-. Therefore you should run mutation testing in an separate environment. If you use conda, you can prepare such an environment using the following script:
> 
> ```shell
> conda create -n mutmut pip python=3.6
> conda activate mutmut
> pip install -r requirements.txt
> ```
> 

As test-case reduction relies on the results of evaluation, make sure that you've run `./evo.sh` beforehand.

**Step 1**

In mutmut environment, 
```shell
python3 evalplus/tsr/run.py --mutation_only
```

**Step 2**

In the original EvalPlus environment,
```shell
python3 evalplus/tsr/run.py --model MODEL
```
* If `MODEL` is a specific LLM name, the cross-validation results will be generated in `./tsr_info`.
* If `MODEL == ALL`, a reduced dataset will be generated in `./tsr_info`.

## Generate Tests for HumanEval+ from Scratch

To generate new tests using EvalPlus, please export your OpenAI key (used to query ChatGPT)
```shell
export OPENAI_API_KEY='your_key'
```

Then run the `inputgen.py` script and indicate the number of ChatGPT seed inputs to be generated (using `--chatgpt_len`) and the number of mutation-based inputs to be generated (using `--mut_len`) 
```shell
python evalplus/inputgen.py --dataset humaneval --chatgpt_len 30 --mut_len 1000 --output HumanEvalPlusInputs.jsonl
```
