# Gen-review: A Dataset and Large-scale Study of AI-Generated and Human-Authored Peer Reviews


## Installation
We use [uv](https://docs.astral.sh/uv/) to manage the virtual env.
Clone the repository and install the requirements:
```bash
# Navigate to the folder
cd path/to/genreiew/folder
# Install uv if not already there
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create the environment for this repository
uv venv --python 3.11
# Activate the virtual env
source .venv/bin/activate
# Install the dependencies
uv pip install -r requirements.txt
```

Before computing anything, all scripts download the dataset from [Harvard Dataverse](https://dataverse.harvard.edu/), and the page of the database is here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PYDPEZ.
**The dataset occupies 1.6GB on disk**, so be sure to reserve some space to store it.

We share all the data as a SQLite database, and it is easy and efficient to fetch the needed information.

## Definition of tables
We share three tables, whose schema is the following.

### submission
It contains the information of the submitted papers.
It is defined as:
```sql
CREATE TABLE SUBMISSION (
    id             TEXT    NOT NULL
                           PRIMARY KEY,
    paper_number   INTEGER NOT NULL,
    title          TEXT    NOT NULL,
    abstract       TEXT    NOT NULL,
    tldr           TEXT    NOT NULL,
    primary_area   TEXT    NOT NULL,
    code_of_ethics TEXT    NOT NULL,
    pdf            TEXT    NOT NULL,
    keywords       TEXT    NOT NULL,
    decision       TEXT    NOT NULL,
    when_submitted INTEGER NOT NULL,
    source_id      TEXT
);
```
### review
It contains the information of human-submitted reviews, defined as:
```sql
CREATE TABLE REVIEW (
    paper_id                           TEXT NOT NULL,
    reviewer_id                        TEXT NOT NULL,
    summary                            TEXT NOT NULL,
    soundness                          TEXT NOT NULL,
    presentation                       TEXT NOT NULL,
    contribution                       TEXT NOT NULL,
    strength                           TEXT NOT NULL,
    weaknesses                         TEXT NOT NULL,
    questions                          TEXT NOT NULL,
    flag_for_ethics_review             TEXT NOT NULL,
    rating                             TEXT NOT NULL,
    confidence                         TEXT NOT NULL,
    correctness                        TEXT NOT NULL,
    technical_novelty_and_significance TEXT NOT NULL,
    empirical_novelty_and_significance TEXT NOT NULL,
    main_review                        TEXT NOT NULL,
    summary_of_the_review              TEXT NOT NULL,
    binocular_score                    REAL,
    PRIMARY KEY (
        paper_id,
        reviewer_id
    ),
    FOREIGN KEY (
        paper_id
    )
    REFERENCES PAPER (id) 
);
```
### genai_review
It contains the information of the AI-generated reviews we created with ChatPDF, and defined as:

```sql
CREATE TABLE GENAI_REVIEW (
    paper_id        TEXT NOT NULL,
    type            TEXT CHECK (type IN ('neutral', 'positive', 'negative') ),
    generated       TEXT NOT NULL,
    rating          TEXT NOT NULL,
    binocular_score REAL,
    PRIMARY KEY (
        paper_id,
        type
    ),
    FOREIGN KEY (
        paper_id
    )
    REFERENCES submission (id) 
);
```

## How to materialize reviews
Different editions of ICLR are characterized by different templates for reviews. Thus, many fields might be NULL inside the database.
In `utils/review_template.py` you can find a function that, given a Pandas dataframe containing a review record and the year, it will generate a string with the original review template.

Regarding the AI-generated reviews, all the content is stored into the column `generated`.

## Execute code
To execute the code, from the source directory launch the requested script as a Python module as follows:
```bash
# To plot the bias induced by LLMs
python -m dataset_metrics.fig_bias_in_decision
# To plot the length of reviews
python -m dataset_metrics.fig_length_constraints
# To plot the distribution of scores of Binoculars
python -m dataset_metrics.fig_scores_binoculars
# To plot the agreement between human- and AI-generated reviews
python -m dataset_metrics.fig_agreement
```