# Linear Mode Connectivity on enwik8 Dataset with MoE Models

This document outlines the full pipeline to train, fine-tune, align, and analyze Mixture-of-Experts models on the **enwik8** dataset using a GPT-2 architecture.

## Step 0: Prepare the Data

Download and preprocess the enwik8 dataset:

```bash
bash src/enwik8/data.sh
```

The processed data will be saved in: `./data/enwik8/`

---

## Step 1: Train a Baseline GPT-2 Model

```bash
CUDA_VISIBLE_DEVICES=0 python src/enwik8/train_model.py \
    --model-config-name gpt2  \
    --data-path ./data/enwik8/ \
    --model-save-dir /root/weights/enwik8
```

This will save the trained model checkpoints to: `/root/weights/enwik8/`

---

## Step 2: Fine-tune with Mixture-of-Experts (MoE)

Fine-tune the model with routed experts on layer 0:

```bash
CUDA_VISIBLE_DEVICES=0 python src/enwik8/finetune_moe.py \
    --model-path /root/weights/enwik8/Main-lr0.0007-iter80-size48/checkpoint_73000/ \
    --data-path ./data/enwik8/ \
    --model-save-dir /root/weights/enwik8/finetune \
    --moe-layer-indices 0 \
    --num-shared-experts 0 \
    --num-routed-experts 2 \
    --topk 2 \
    --seed 0
```

Repeat this step with different random seeds (e.g., `--seed 20`) to generate models for interpolation and matching.

---

## Step 3: Perform Expert Weight Matching

Match experts between two fine-tuned models:

```bash
CUDA_VISIBLE_DEVICES=0 python src/enwik8/weight_matching_moe.py \
    --model-a /root/weights/enwik8/finetune/lr0.0007-topk2-shared0-routed2-seed0/checkpoint_39999 \
    --model-b /root/weights/enwik8/finetune/lr0.0007-topk2-shared0-routed2-seed20/checkpoint_39999 \
    --data-path ./data/enwik8/
```

---

## Step 4: Loss Barrier without Matching Algorithms 

Evaluate Linear Mode Connectivity (LMC):

```bash
python src/enwik8/loss_barrier.py \
    --file-path /root/results/enwik8/[lr0.0007-topk2-shared0-routed2-seed0+lr0.0007-topk2-shared0-routed2-seed20].json
```

