# Peer Rank and Discussion Improve Large Language Model based Evaluations

## Peer Rank (PR) process:

![Peer Rank (PR) process](assets/peer_rank.png)

## Peer Discussion (PD) process:

![Peer Discussion (PD) process](assets/peer_discussion.png)

## Install Dependencies

Please follow the following commands to install dependencies.

```bash
# create an environment
conda create -n prd python=3.8
conda activate prd

# install from the requirement file by pip
pip install -r requirement.txt
```

## Datasets

We publish the dataset `Vicuna80` in the `data` folder. For information about datasets, please refer to the [README](data/README.md) file.

## Generated Results

For information about generated results, please refer to the [README](results/README.md) file.

## Run

Please follow the bash commands to run corresponding parts.

### Peer Rank

Please enter the `peer_rank` folder by the following command.

```bash
cd peer_rank/
```

#### Reviews Generation

Please run the `gen_{reviewer}.sh` scripts to generate reviews for answers from one pair of model. For example,

```bash
./gen_claude.sh ../data/vicuna80/generations/answer_[Model 1].jsonl ../data/vicuna80/generations/answer_[Model 2].jsonl
```

To generate reviews for answers from all pairs of models, please run the `gen_{reviewer}_all.sh`. For example,

```bash
./gen_claude_all.sh
```

#### Peer Ranking

To run peer ranking, please open the `peer_ranking.ipynb` file by any Jupyter Notebook.

### Peer Discussion

Please enter the `peer_discussion` folder by the following command.

```bash
cd peer_discussion/
```

Before running any python script, please make sure the file `config.yml` contains correct configurations you need.

#### Reviews Generation

```bash
python review_lfqa.py
```

There is no codes of generating reviews for Vicuna80 since they are provided in the Peer Rank related codes.

#### Discussion Generation

```bash
# discuss on LFQA
python gather_all_lfqa.py
python discuss_lfqa.py

# discuss on Vicuna80
python gather_all_vicuna80.py
python discuss_vicuna80.py
```

