# ⚙️ Installation

Our codebase has been tested on A800 servers with the following environment:

* `python 3.10.0`
* `torch 2.6.0+cu124`

## 🔧 Set Up Training Environment

```bash
conda create -n ranktuner python=3.10 -y
conda activate ranktuner
cd verl
bash scripts/install_vllm_sglang_mcore.sh
pip install --no-deps -e .
```

## Set up MATH Evaluation Environment

```bash
conda create -n vllm_dft python==3.12 -y
conda activate vllm_dft
cd math_evaluation/latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt 
pip install vllm==0.8.5 --no-build-isolation
pip install "transformers<4.54.0"
pip install python-dateutil pebble word2number datasets timeout_decorator multiprocess math-verify pylatexenc fire
```



# 🚀 Getting Started

## Step 1: Prepare Data

```bash
# Generate training data (optional: change --train_end to control volume)
python examples/data_preprocess/numina_cot.py --train_end 100000

# Generate evaluation data
python examples/data_preprocess/math_dataset.py
```

## Step 2: Training

```bash
cd verl
conda activate DFT
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun --standalone --nnodes=1 --nproc_per_node=4 \
  -m verl.trainer.fsdp_general_trainer \
  data.train_files="data/numina_cot/train.parquet" \
  data.val_files="data/math500/test.parquet" \
  data.prompt_key="extra_info" \
  data.response_key="extra_info" \
  data.prompt_dict_keys="['question']" \
  data.response_dict_keys="['answer']" \
  data.train_batch_size=256 \
  data.micro_batch_size_per_gpu=8 \
  data.max_length=2048 \
  optim.lr=5e-5 \
  model.partial_pretrain="Qwen/Qwen2.5-Math-7B" \
  model.use_liger=True \
  model.fsdp_config.model_dtype=bf16 \
  trainer.default_local_dir="checkpoints_5e-5/numina-cot-ranktuner-Qwen2.5-Math-7B-medium" \
  trainer.project_name="numina-cot" \
  trainer.experiment_name="numina-cot-ranktuner-Qwen2.5-Math-7B-medium-$(date +%Y%m%d-%H%M%S)" \
  trainer.logger="['console','tensorboard','wandb']" \
  trainer.default_hdfs_dir=null \
  trainer.test_freq=100 \
  trainer.save_freq=200 \
  trainer.total_epochs=1 \
  trainer.loss_type="ranktuner"
```

## Step 3: Evaluation

### MATH Evaluation

```bash
cd math_evaluation/
conda activate vllm_dft
CUDA_VISIBLE_DEVICES=0,1,2,3 \
bash sh/eval.sh \
  "qwen25-math-cot" \
  "../verl/checkpoints_5e-5/numina-cot-ranktuner-Qwen2.5-Math-7B-medium/global_step_390" \
  "eval_output_5e-5/numina-cot-ranktuner-Qwen2.5-Math-7B-medium_global_step_390" \
  16 \
  1
```
