# Supplementary Material: The Differences Between Direct Alignment Algorithms are a Blur

## Project Overview

This repository contains the codebase and scripts used for the experiments in the paper "The Differences Between Direct Alignment Algorithms are a Blur". The code is fully anonymized for the review process. It provides implementations and reproducible pipelines for SFT and preference-based training, as well as inference and evaluation.

## Installation

**Python version:** This project requires Python 3.10.

Install all dependencies using pip:

```bash
pip install -r requirements.txt
```

You may also need to install additional CUDA or hardware-specific libraries depending on your environment (see configs and scripts for details).

## Configuration Files

Example configuration files for training and inference are provided in the `configs/` directory:

- `configs/train/` — training experiment configs (preference-based, SFT, deepspeed, etc.)
- `configs/inference/` — inference and evaluation configs

## Training and Inference Scripts

All main scripts for launching experiments are located in the `bin/` directory:

- `bin/train/run_pref.sh` — launch preference-based (DPO and related) training
- `bin/train/run_sft.sh` — launch supervised fine-tuning (SFT) training
- `bin/inference/run_gen.sh` — run model inference
- `bin/inference/run_offline_metrics.sh` — run offline metrics evaluation

These scripts handle environment setup, multi-GPU, and (optionally) DeepSpeed integration.

**Note:** All commands should be run from the project root directory.

## Toy Example: Prompt Bias Experiment

The `toy_example/` directory contains a self-contained notebook and result plots for the toy experiment on prompt bias, as described in the supplementary material (Appendix, Section "Experiment on Prompt Bias").

- `toy_example/losses.ipynb` — Jupyter notebook with code to reproduce the toy experiment, including data generation, model training, and evaluation.
- `toy_example/plots/` — PDF figures generated by the notebook, visualizing the results for different model capacities and bias regimes.

This experiment provides additional evidence and intuition for the main findings of the paper regarding the interaction of alignment objectives and prompt-specific bias.

## Usage Examples

### Preference-Based Training

```bash
bash bin/train/run_pref.sh --use-deepspeed <true|false> --wandb-key <WANDB_API_KEY> --config-path <path_to_config.json>
```

### Supervised Fine-Tuning (SFT)

```bash
bash bin/train/run_sft.sh --use-deepspeed <true|false> --wandb-key <WANDB_API_KEY> --config-path <path_to_config.json>
```

### Inference

```bash
bash bin/inference/run_gen.sh <use_vllm:true|false> <path_to_inference_config.json>
```

### Offline Metrics Evaluation

```bash
bash bin/inference/run_offline_metrics.sh <path_to_offline_metrics_config.json>
```

## Output

- Training outputs are saved in `train_output/`
- Inference outputs are saved in `inference_output/`

## Reproducibility

- All configuration files used for experiments are provided in the `configs/` directory.
- Scripts are designed to be run as-is on any standard multi-GPU server with Python 3.10 and CUDA support.
- For dataset access, see the anonymous link at the bottom of this file.

## Contact

For questions regarding this codebase, please use the OpenReview discussion forum for the paper.

## Datasets Access

Anonymous link to formatted UltraFeedback, UltraChat and Reddit TL;DR datasets:  
https://drive.google.com/file/d/1WkxnQqu_vNZwOUPaLgDwdHUTIgyUyHKF/view?usp=sharing
