# UFT: Unifying Supervised and Reinforcement Fine-Tuning

## Installation

```
conda create -n uft python=3.9
conda activate uft
bash install.sh
```

## Usage

### Training

```
python run.py
  --algo              Algorithm to use: {sft, rft, stage, r3, uft}
  --n_gpu             Number of GPUs
  --visible-devices   GPU index to use, e.g., "0,1,2,3"
  --T                 Total training steps (default: 500)
  --T_hint            Maximum training steps with hint (default: 300)
  --data              Dataset: {countdown,math,kk_logic,others}
  --model             Model name (e.g., Qwen2.5-1.5B)
  --tp_size           
  --eval              Triggered to evaluate the model, otherwise training
  --idx IDX           Index of the current process (default=0)
  --sft_loss_coef     Coefficient for the additional log-likelihood term on hint
  --n_rollout        Number of trajectory rollouts (default 4)
```

#### Example 
`python run.py --model Qwen/Qwen2.5-1.5B --data countdown`

#### Requirement
- `Qwen2.5-0.5/1.5B` and `Llama-3.2-1B`: 2 `H100`
- `Qwen2.5-3B` and `Llama-3.2-3B`: 4 `H100`

`Qwen2.5-0.5/1.5B` / `Llama-3.2-1B` can be trained with 1 `H100` by setting `n_rollouts=2`

### Evaluate

Change `model` and `dataset` to the the model name (*e.g.*, `Qwen/Qwen2.5-1.5B`) and dataset name (*e.g.*, `countdown`) to evaluate
```
python run.py --model {model} --data {dataset} --eval
```
