# MergeMoE: Compressing MoE Models By Merging Experts

Code for the paper "**MergeMoE: Compressing MoE Models By Merging Experts**"



## Setup

```shell
conda create -n mcsmoe python=3.9 -y && conda activate mcsmoe
pip install -r requirements.txt
```
## Usage and Examples

#### Merge MoE
```shell
CUDA_VISIBLE_DEVICES=0 \
  python mergemoe/merge-moe.py \
  --task="winogrande" \
  --num_samples_for_merging=64 \
  --num_groups=32 \
  --merging_layers="1,2,3,4,5,6,7,8" \
  --merging_strategy="ours" \
  --model_type="qwen" \
  --output_dir="results/winogrande/merged-qwen/" \
  --checkpoint="/root/model/Qwen1.5-MoE-A2.7B"
```
You can modify the following parameters according to your needs: 
* task
* num_samples_for_merging: \
Path to pre-trained model checkpoint
* number_groups: \
Target number of expert groups
* merging_layers: \
Layers to apply merging on
* merging_strategy
* model type

Refer to `MergeMoE/mergemoe/merge-moe.py` to get more information.



## Evalutaion
We use the the evaluation framework [eval_dclm](https://github.com/mlfoundations/dclm?tab=readme-ov-file#getting-started) and you can follow the steps to set up the environment for evaluation.

#### Run evaluation
```shell
torchrun --nproc_per_node 2 \
  eval_dclm/eval_openlm_ckpt.py \
  --hf-model results/winogrande/merged-qwen/ours \
  --tokenizer /root/model/Qwen1.5-MoE-A2.7B \
  --eval-yaml "static/winogrande.yaml" \
  --output-file results/our_qwen_winogrande_results.json \
  --donot-compute-perplexity
```

