# The Best of N Worlds

A repository for the paper The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@l Optimization


## Prerequisits

1. We use Python 3.11 in this repository

* install [pyenv](https://github.com/pyenv/pyenv#installation)
* install python:
  ```bash
  pyenv install 3.11
  ```
2. We use poetry as package manager:
* install [poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)
* tell poetry which python to use
  ```bash
  pyenv shell 3.11
  python --version  # ensure that pyenv activated 3.11 version
  poetry env use `which python`
  ```
* install dependencies
  ```bash
  poetry install
  ```


3. **Set up environment variables** (create `.env` file):
   ```bash
   WANDB_KEY=your_wandb_key
   WANDB_HOST=yous host


   ```

## Datasets

  Datasets used in this work can be found [here](https://zenodo.org/records/17193154?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjNlNjJkNTM2LWJiYmQtNDZiZC1hOGYyLWExYjBlYmIzZDNmYSIsImRhdGEiOnt9LCJyYW5kb20iOiJiMDJkMDIzMWYzNDNkMGYyZTZiNjM4YTU1MzMzM2VjMyJ9.ujlPih0bbzFEfC2JhpVHy1fwMrGfJAXxDsh7audLP7IC9KfO3JethcKklnvLRDPe-1qU0KT1eCDnzwB2HMOXjQ), however one can also run a script to prepare them:

  ```bash
  poetry run python data_processing/prepare_codecontests.py
  poetry run python data_processing/prepare_livecodebench.py
  poetry run python data_processing/prepare_livebench.py
  poetry run python data_processing/prepare_codeforces.py
  poetry run python data_processing/prepare_mbpp_io.py
  ```

## Execution server
We use an execution server in docker to run generated code safely. To launch it do the following:

   ```bash
   cd docker_execution
   ./build_and_run.sh
   # Server will be available at localhost:1337
   ```





## Training

To train a model you should run the following command:
  ```bash
  poetry run python main.py --config-name 'config_name'
  ```

Replace config name with desired config from `configs` folder.

## Evaluation

To run evaluation run the following command:

  ```bash
  poetry run python run_eval.py --model 'model_path' --dataset 'dataset_name' --k k 
  ```

  * `model_path` - local path to the checkpoint or name of the model from HuggingFace
  * `dataset_name` - name of dataset on which evaluation will be performed. Can be one of the following: `codecontests`, `livecodebench`, `livebench`, `codeforces`, `mbpp`.
  * `k` - number of generations per datapoint