# README

This repo is the anonymization code base for paper `What Matters in Data for DPO`.

## Environment setup

1. Dependencies.
  
    Check the `cu124_olmo2.yaml` and use `conda env create -f cu124_olmo2.yaml` to create the environment.

2. Login in your shell.
    ```bash
    pip install wandb
    wandb login
    huggingface-cli login
    ```

## Workflow

### Training

***We have uploaded all our used datasets to huggingface datahub. However, due to anonymity concerns, we cannot provide the original datasets. So you have to start from scratch to reimplement our work. Once you have the training data, you can directly start from step 4.***

1. Find a proper preference dataset, containing "chosen" and "rejected" columns. 
2. Run `run_annotation.py` to annotate the dataset.
    ```python
    python run_annotation.py --ds_path=<path_to_dataset> --rm_path=<path_to_rm>
    ```
    You can also run `annot.sh` for distributional annotation.
3. To build the dataset described in paper, you have also need to run `build_diff_dataset.py`. The input parameter is at Line 195, and output is at Line 240-243.
4. Prepare a DPO training config, following the example in `configs/template.yaml`. Then kick off the training with:
    ```bash
    accelerate launch --config_file=configs/zero2.yaml run_dpo.py configs/PATH_TO_YOUR_CONFIG.yaml --train_path=<training_dataset_path> --output_dir=<model_output_dir> --n_iter=0 --manual_seed=42 --seed=42 --data_seed=42
    ```
    For reproducing the results in paper, use the `configs/llama_8b_base.yaml` with seed 42, 1024 and 3407.

### Evaluation

As described in paper, we can directly use TULU3's open-sourced evaluation suite to do the evaluation, except for Alpaca-Eval.

To run TULU3 evaluation suite, install `olmes` first following the official instruction [here](https://github.com/allenai/olmes). Then run the following command:
```bash
olmes --model <model-path> --task mmlu:mc::mmlu truthfulqa::tulu gsm8k::tulu ifeval::tulu --model-args '{"max_length": 4096}' --output-dir <output-dir>
```

> Alpaca-Eval's huggingface dataset support is broken (at least for the time we wrote this paper). So we copy its reference completion data to local, at `raw_data/`.

To run Alpaca-Eval, first sample from the trained model by:
```bash
GENLM=<trained-model-path> OUTN=<generation-output-path> gen_alpaca_eval_outputs.sh
```

Then run the official evaluation script:
```bash
OPENAI_API_KEY=<secret> alpaca_eval evaluate --model_outputs <generation-json-output-path> --reference_outputs raw_data/alpaca_eval_ref_gpt4.json
```