# Self-Debias: Fairness-Aware Self-Correction

This repository contains the implementation of **Self-Debias**, a method for training fair and unbiased language models through self-correction and resource allocation.

## Overview

Self-Debias consists of three training stages:

1. **Stage 0**: Supervised Fine-Tuning (SFT)
2. **Stage 1**: Self-Correction (SC) Training
3. **Stage 2**: DPO with Fairness Regularization (FR)
4. **Stage 3**: Online Iterative Training

## Quick Start

### Environment Setup

```bash
pip install torch transformers vllm openai tqdm
```

### Stage 0: SFT Training

```bash
python train_0_sft.py \
    --base_model /path/to/base/model \
    --data data/sft_data.jsonl \
    --output_dir ckpt/stage0
```

### Stage 1: Self-Correction Training

```bash
python train_1_sc.py \
    --base_model /path/to/base/model \
    --prev_adapter ckpt/stage0/final_adapter \
    --data data/sc_data.jsonl \
    --output_dir ckpt/stage1
```

### Stage 2: DPO + Fairness Regularization

```bash
python train_2_fr.py \
    --base_model /path/to/base/model \
    --prev_adapter ckpt/stage1/final_adapter \
    --data data/dpo_data.jsonl \
    --output_dir ckpt/stage2 \
    --self_debias_alpha 0.25 \
    --fair_beta 0.1
```

### Stage 3: Online Training

#### Step 1: Generate Training Data

```bash
# Set API credentials
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"

# Generate data with multi-GPU support
python online_step1.py \
    --model_path /path/to/model \
    --adapter_path ckpt/stage2/final_adapter \
    --input_file data/unlabeled.jsonl \
    --output_file data/online_raw.jsonl \
    --tensor_parallel_size 2 \
    --api_workers 20
```

#### Step 2: Filter High-Quality Data

```bash
python online_step2_gen_chosen.py \
    --judge_model /path/to/judge/model \
    --input_file data/online_raw.jsonl \
    --output_file data/online_preference.jsonl
```

#### Step 3: Online DPO Training

```bash
python train_3_online.py \
    --base_model /path/to/base/model \
    --prev_adapter ckpt/stage2/final_adapter \
    --data data/online_preference.jsonl \
    --output_dir ckpt/stage3
```

## Key Parameters

- `--self_debias_alpha`: Weight for DPO loss (default: 0.25)
- `--fair_beta`: Weight for fairness regularization (default: 0.1)
- `--tensor_parallel_size`: Number of GPUs for model parallelism (default: 2)
- `--api_workers`: Concurrent API threads for bias injection (default: 20)
- `--num_refinements`: Self-correction rounds (default: 3)

## File Structure

```
src/
├── train_0_sft.py              # Stage 0: SFT
├── train_1_sc.py               # Stage 1: Self-Correction
├── train_2_fr.py               # Stage 2: DPO + FR
├── train_3_online.py           # Stage 3: Online Training
├── online_step1.py             # Online data generation
├── online_step2_gen_chosen.py  # Data filtering
├── self_debias_trainer.py      # Custom trainer implementation
└── README.md                   # This file
```

## Citation

If you use this code, please cite our paper:

```bibtex
@article{self-debias-2025,
  title={Self-Debias: Fairness-Aware Self-Correction for Language Models},
  author={Anonymous},
  year={2025}
}
```

## License

This code is released under the MIT License.
