# Set up environment:

```bash
conda create -yn bitdelta python=3.9
conda activate bitdelta

pip install -e .
```

### Compress Model

Compress the weight delta and perform scale distillation:

```
CUDA_VISIBLE_DEVICES=0,1 python \
    bitdelta/train.py \
    --base_model meta-llama/Llama-2-7b-hf \
    --finetuned_model lmsys/vicuna-7b-v1.5 \
    --save_dir $MODEL_SAVE_DIR \
    --batch_size 4 \
    --num_steps 200 \
    --save_full_model True
```

where `$MODEL_SAVE_DIR` is specified.

If `--save_full_model` is specified, the compressed model will also be saved in HuggingFace format at `$MODEL_SAVE_DIR/calibrated_model`. Otherwise, only the delta will be saved.

### Perplexity Check

Double check the perplexity of the compressed model:

```
### Perplexity CheckCUDA_VISIBLE_DEVICES=0 python \
    bitdelta/eval_ppl.py \
    --base_model meta-llama/Llama-2-7b-hf \
    --dataset_name wikitext \
    --subset wikitext-2-raw-v1 \
    --save_dir $PPL_SAVE_DIR \
    --num_eval_samples 100 \
    --model_diff $MODEL_SAVE_DIR/diff.pt \

```

To replicate our other results, please use `--save_full_model` to run the model in Llama format for compatibility with eval harnesses.
