# stackelberg-learning

## Setup
Create a new virtual environment with Python version `3.11.6` and install the required packages in `requirements.txt`.
```commandline
python3 -m venv env
pip install --upgrade pip
pip install -r requirements.txt
```
If you are using cuda for training, make sure to install the `torch` version compatible with your CUDA version.

Training runs are logged to Weights and Biases. To seamlessly log your experiments to your account,
save your api-key to `${HOME}/.wandb-api-key` file. You can find your api-key in your W&B account settings.

## Dataset preparation

### HelpSteer2
1. Preprocess the dataset and save the train-validation split using the following script
(probably executed in a jupyter notebook):
```python
from src.preprocessing.helpsteer2 import load_dataset as helpsteer2_load_dataset
dataset = helpsteer2_load_dataset(seed=42, train_validation_split=0.8)
dataset.save_to_disk("path/to/preprocessed_helpsteer2_dataset")
```

2. Train the separate reward models for each attribute using the following script:
```bash
bash scripts/train_reward_model.sh attribute_name
```
for `attribute_name` in `helpfulness`, `correctness`, `coherence`, `complexity`, `verbosity`.
Before training, add the correct paths to the `scripts/train_reward_model.sh` script
and update the parameters according to the available resources.
We recommend setting the `output_dir` variable such that it includes the attribute name, e.g.,
`path/to/reward_model/attribute_name`.
All fine-tuning and evaluation scripts expects the `reward_model_adapters_path` argument to be set as follows
`path/to/reward_model/(attribute1|attribute2|attribute3|...)` where the attributes are separated by `|`.

# Fine-Tuning
The training scripts for `RLOO`, `Nash-MD`, and `StackelbergGDA` are located in the `scripts` folder.
Before executing the scripts, make sure to update the paths and parameters.
By default, the results are saved to `data/experiments/${run_name}`.
Execute each script from the root directory of the repository.

# Evaluation
To evaluate any given model, update the script `scripts/evaluation.sh` with the right models and datasets.
Generated responses are saved to `path/to/experiment/generation__checkpoint-x` and corresponding rewards for each
attribute are saved to `path/to/experiment/generation__checkpoint-x__rewards`.

You can create correction evaluations across multiple models by running the `correction_evaluation.sh` script.
