# Official Codebase

*Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options*

## Installation

This codebase depends on standard packages listed in both [`conda_requirements.txt`](./conda_requirements.txt) and [`pip_requirements.txt`](./pip_requirements.txt). A Conda environment is required. You can set up the environment as follows:

```bash
conda create -n rlhf python=3.10
conda activate rlhf
conda install --file conda_requirements.txt
pip install -r pip_requirements.txt
```

## Synthetic Data Experiment

You can configure the synthetic environment in [`config.py`](./synthetic_experiment/config.py):

```bash
cd synthetic_experiment
python3 main.py
```

## LLM-Based Experiment

Navigate to the experiment directory:

```bash
cd LLM_experiment
```

The datasets used in our experiments on [TREC-DL](https://microsoft.github.io/msmarco/TREC-Deep-Learning) and [NECTAR](https://huggingface.co/datasets/berkeley-nest/Nectar) can be generated using the [`preprocessing.ipynb`](./LLM_experiment/preprocessing.ipynb) notebook.

After preparing the data:

1. Generate embedding vectors and Mistral model scores:

   ```bash
   bash make_embedding_score.sh
   ```

2. Run the experiment:

   ```bash
   bash run_experiment.sh
   ```

You can modify the experimental setup and algorithm configurations directly in [`run_experiment.sh`](./LLM_experiment/run_experiment.sh).

