
# Stochastic Regularized Best-of-N (SRBoN)

This repository contains the implementation of **Stochastic Regularized Best-of-N (SRBoN)**.

## Setup Instructions

### Step 1: Environment Setup

Create and activate a virtual environment, then install the required dependencies:

```bash
# Create virtual environment
python3 -m venv env
source env/bin/activate

# Install dependencies
pip install -r requirements.txt
```

### Step 2: Sample Collection and Metric Computation

#### Collect Samples

To collect samples from the model, use the following command. You can specify the dataset, model, and number of samples:

```bash
./experiments/sample.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]
```

#### Compute Metrics

You can compute various utility metrics such as log probability, Wasserstein distance, and token length using the following scripts:

```bash
# Compute log probability
./experiments/compute_logprob.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

# Compute Wasserstein distance
./experiments/compute_wd.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

# Compute token length
./experiments/compute_length.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]
```

#### Compute Reward Values

To compute reward values for specific datasets, you can use the following command. Here, specify the dataset, number of samples, and the reward type:

```bash
./experiments/compute_reward.sh -d [DATASETS] -s [NUMBER_OF_SAMPLES] -i [REWARD_TYPE]
```

### Step 3: Running SRBoN

Finally, to compute the SRBoN values, run the following script:

```bash
python3 stochastic_rbon/stochastic_rbon.py --dataset [DATASETS] --ncandidates [NUMBER_OF_SAMPLES]
```

## Examples

Below is an example of running the SRBoN pipeline using the `alpaca` dataset, the `HuggingFaceH4/mistral-7b-sft-beta` model, and 100 samples.

```bash
# Collect 100 samples from the model
./experiments/sample.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

# Compute log probabilities for the collected samples
./experiments/compute_logprob.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

# Compute Wasserstein distance for the samples
./experiments/compute_wd.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

# Compute the token length of the samples
./experiments/compute_length.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100
```

For computing reward values using different reward models:

```bash
# Compute reward values using the OpenAssistant reward model
./experiments/compute_reward.sh -d alpaca -s 100 -i OpenAssistant/reward-model-deberta-v3-large-v2

# Compute reward values using the openbmb reward model
./experiments/compute_reward.sh -d alpaca -s 100 -i openbmb/Eurus-RM-7b
```

Finally, run the **SRBoN** computation with 100 candidates:

```bash
python3 stochastic_rbon/stochastic_rbon.py --dataset alpaca --ncandidates 100
```
