# Less is More: Undertraining Experts Improves Model Upcycling
For our NLP experiments we adapted the [TIES Merging repo](https://github.com/prateeky2806/ties-merging) while for the vision setting we largely use the [Task Arithmetic repo](https://github.com/mlfoundations/task_vectors).

### References:
```bibtex
@inproceedings{
    yadav2023tiesmerging,
    title={{TIES}-Merging: Resolving Interference When Merging Models},
    author={Prateek Yadav and Derek Tam and Leshem Choshen and Colin Raffel and Mohit Bansal},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
    url={https://openreview.net/forum?id=xtaX3WyCj1}
}

@inproceedings{
    ilharco2023_task-arithmetic,
    title={Editing models with task arithmetic},
    author={Gabriel Ilharco and Marco Tulio Ribeiro and Mitchell Wortsman and Ludwig Schmidt and Hannaneh Hajishirzi and Ali Farhadi},
    booktitle={The Eleventh International Conference on Learning Representations },
    year={2023},
    url={https://openreview.net/forum?id=6t0Kwf8-jrj}
}
```

## Set-up
1. Create virtual environment, activate it and install dependencies.
```bash
# We use conda 4.8.3
conda create -n less-is-more python=3.8
conda activate less-is-more

python -m pip install -r requirements.txt -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install -U datasets
pip install --upgrade evaluate
```

2. Set HuggingFace cache (needs to be done after every reboot / new login session)
```
export HF_HOME="/path/to/your/hf_cache"
```


## Training
### Full Fine-tuning (FFT)
```bash
dataset="paws" # choices: "paws", "qasc", "quartz", "story_cloze", "winogrande", "wiki_qa", "wsc"
seed=0 # random seed
steps=2048 # number of training steps / iterations / batches

python src/training.py -c configs/t5_base.json -k project_name=training_fft experiment_name="${dataset}_s${seed}" train_dataset=$dataset inference_dataset=$dataset train_dataset_mixture=None inference_dataset_mixture=None num_batches=$steps split="validation" seed=$seed
```

### LoRA fine-tuning
```bash
dataset="paws" # choices: "paws", "qasc", "quartz", "story_cloze", "winogrande", "wiki_qa", "wsc"
seed=0 # random seed
steps=2048 # number of training steps / iterations / batches

python src/training_lora.py -c configs/t5_base.json -k project_name=training_lora experiment_name="${dataset}_s${seed}" train_dataset=$dataset inference_dataset=$dataset train_dataset_mixture=None inference_dataset_mixture=None num_batches=$steps split="validation" seed=$seed lr=0.0005
```

## Evaluation
```bash
split="test" # split to evaluate on ("validation" or "test")
dataset="paws" # choices: "paws", "qasc", "quartz", "story_cloze", "winogrande", "wiki_qa", "wsc"
seed=0 # random seed
step=2047 # number of training steps / iterations / batches

python ./src/inference.py -c configs/t5_base.json -i $dataset --kwargs checkpoint_to_directly_load_model="./checkpoints/training_fft/t5-base/${dataset}_s${seed}/checkpoints/checkpoint_${step}.pt" split=$split project_name="inference_fft" experiment_name="ckpt_${dataset}_s${seed}_${step}_${split}"
```

## Merging
### MEAN
```bash
split="test" # split to evaluate the merged model on ("validation" or "test")
seed=0 # random seed
step=2047 # checkpoint number of steps to load

python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f basic_mean --kwargs split=$split project_name=merging_fft_average experiment_name="ckpt_s${seed}_${step}_${split}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step
```

### TA validation and test
```bash
seed=0 # random seed
step=2047 # checkpoint number of steps to load

# Evaluate on the validation set for different hparam values "task-vector_linear+0.1+1.01+0.1" tests Task Arithmetic method with scaling hyperparameter from 0.1 to 1.0 in steps of 0.1
python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f task-vector_linear+0.1+1.01+0.1 --kwargs split=validation project_name=merging_fft_ta_valid experiment_name="ckpt_s${seed}_${step}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step

# if 0.4 is the best hyperparameter, evaluate on the test set with:
python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f task-vector_0.4 --kwargs split=test project_name=merging_fft_ta_test experiment_name="ckpt_s${seed}_${step}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step
```

### TIES
```bash
# Model arguments
seed=0 # random seed
step=2047 # checkpoint number of steps to load

# TIES method arguments
scale=linear+0.8+2.51+0.1 # how to interpolate the scaling hyperparameter alpha (from 0.8 to 2.5 in steps of 0.1)

# Evaluate on validation set for different values of k and alpha
redundant=topk10 # what percentage of top parameters to keep
python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f "${redundant}_mass_dis-mean_${scale}" --kwargs split=validation project_name=merging_fft_ties_valid experiment_name="topk10_ckpt_s${seed}_${step}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step
redundant=topk20
python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f "${redundant}_mass_dis-mean_${scale}" --kwargs split=validation project_name=merging_fft_ties_valid experiment_name="topk20_ckpt_s${seed}_${step}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step
redundant=topk30
python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f "${redundant}_mass_dis-mean_${scale}" --kwargs split=validation project_name=merging_fft_ties_valid experiment_name="topk30_ckpt_s${seed}_${step}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step

# Suppose k=30% top parameters and alpha=1.1 achieves best validation performance then evaluate on test set
python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f topk30_mass_dis-mean_1.1 --kwargs split=test project_name=merging_fft_ties_test experiment_name="ckpt_s${seed}_${step}" load_dir="./checkpoints/training_fft/t5-base" load_seed=$seed step=$step
```

### Merging LoRAs
To merge LoRAs it suffices to adjust the `load_dir` add `lora="true"`. For example, for Average merging:
```bash
split="test" # split to evaluate the merged model on ("validation" or "test")
seed=0 # random seed
step=2047 # checkpoint number of steps to load

python ./src/ties_merging.py -c configs/t5_base.json -i t5_mixture -m t5_mixture -f basic_mean --kwargs split=$split project_name=merging_lora_mean experiment_name="ckpt_s${seed}_${step}_${split}" load_dir="./checkpoints/training_lora/t5-base" load_seed=$seed step=$step lora="true"
```


