# Optimized Early-Exit Based Speculative Decoding via Pipeline Parallelism

To ensure reproductibility, we provide detailed instructions for early exit model initialization, training, and inference.


## Requirements

To install requirements:

```setup
pip install -r requirements.txt
```

## CutModel
- You can use the following command to divide the original LLM into several parts that fit to early exit training and pipeline parallel execution. Update `--model_name` with the actual path to model weights and `--num_ee_block` with your granurity.
```bash
python cut_model.py --model_path "/your/model/path" --num_ee_block 4
```
## Training

- You can use the following command to train Vicuna-7B. Update `--model_name_or_path` with the actual path to model weights ,`--data_path` with the actual path to data, `--heaclass`to choose which class of head to train and `--num_ee_block` with your granurity.
```bash
torchrun --nproc_per_node=4 --master_port=20001 train_mem.py \
    --model_name_or_path /your/model/path  \
    --data_path /your/data/path \
    --split_model_path /your/split/model/path \
    --bf16 True \
    --output_dir /output/path \
    --num_train_epochs 2 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 5e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --num_ee_block 4 \
    --headclass "trm" \
```

## Inference
To Inference my model, run:
- You can use the following command to inference Models. Update `--model_path` with the actual path to model weights, `--split_model_path` with the actual path to split model weights ,`--data_path` with the actual path to data，`--ckpt_path` with the actual path to early exit head weights, `--nproc_per_node` with your granularity`--stage` with choose early exit point， `--maxlen` with max outputs length and `--headclass` to choose your head class.

```bash
torchrun --nproc_per_node=4 \
         --master_port=29989 \
     /PPSD/ee_vicuna_test_eval_any2.py \
     --model_path "/XXXX-3/space/models/vicuna-v1.5-7b" \
     --split_model_path "/XXXX-3/space/models/split-vicuna-v1.5-7b/" \
     --data_path "/XXXX-3/space/datasets/data.json" \
     --ckpt_path "/XXXX-3/space/ckpt/distill/ALL_vicuna_7b_ee_layers_lr5e-4_epoch2_logits+top1_trmhead/" \
     --headclass "trm" \
     --stage 0 \
     --maxlen 512
```
## Evaluation

To evaluate my model on ImageNet, run:

```eval
python eval.py --model-file mymodel.pth --benchmark imagenet
```

