# IRT-LLMS
Exploring item-response theory in the context of large language models.

# Data sources

We use ENEM - Exame Nacional do Ensino Médio - data available at
https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/enem

# Requirements

- Python3.9:
    ```pip3 install -r requirements```

- R: (Libraries mirtCAT, arrow, and PerFit)

# Data

## Processed ENEM exams
```sh
unzip data.zip
```

```data.zip``` password is: ```data```

## ENEM human data

```humans-irt-lz.parquet``` has the ENEM human data already processed. Download it at: https://drive.google.com/file/d/1cXc0Q8QNtBt-ueDtjsRiXHbPtrdTaZqG/view?usp=sharing

# Minimal reproducibility 

## Run all experiments

```sh
./run_all_experiments.sh
```

## Process results

```sh
python3 aggregate_results.py
cd scripts/calculate-irt/
RScript compute_irt_models.R
RScript compute_lz_llms.R
cd ../..
python3 split_pythia_results.py
```

**All the main results are in the ```enem-experiments-results-processed.parquet``` file.**  
**Pythia results are in the ```enem-experiments-results-processed-pythia.parquet``` file.**


# Plots

```sh
./run_all_plots.sh
```



