# Linear Mode Connectivity for ViT-MoE on ImageNet

This document describes the experimental pipeline used to evaluate **Linear Mode Connectivity (LMC)** between independently fine-tuned Vision Transformer (ViT) models augmented with Mixture-of-Experts (MoE) layers on the **ImageNet-1K** dataset. The process includes model training, expert permutation alignment, linear interpolation in parameter space, and visualization of the resulting loss landscape.

---

# Image Classification on ImageNet

## Download the ImageNet-1K Dataset (ILSVRC 2012)

To obtain the ImageNet 2012 dataset, navigate to the `/data/` directory and execute the following script:

```bash
bash src/imagenet/data.sh
```

After that, the ImageNet data will be saved in the following directory:

```bash
/data/imagenet/
```

---

## Retrieve Pretrained Vision Transformer (ViT) Weights

Download the pretrained `ViT-Base-Patch16-224` model by running:

```bash
CUDA_VISIBLE_DEVICES=0 python src/imagenet/pretrained.py
```

The pretrained weights will be saved locally at:

```bash
./weights/imagenet/vit-base-patch16-224/
```

---

## Impact of Feedforward Reinitialization on Pretrained Transformer Performance

To evaluate the effect of reinitializing the Feedforward Network (FFN) in each Transformer layer, run the following script:

```bash
CUDA_VISIBLE_DEVICES=0 python src/imagenet/layer_replace_init.py \
    --model-path ./weights/imagenet/vit-base-patch16-224/ \
    --data-path /data/imagenet/
```

This script systematically inserts an MoE block at each layer, fine-tunes the model, and evaluates the resulting performance.

---

## Fine-tune ViT with a Mixture-of-Experts (MoE) Architecture

To fine-tune a Vision Transformer model with a Mixture-of-Experts (MoE) layer, run:

```bash
CUDA_VISIBLE_DEVICES=0 python src/imagenet/finetune_moe.py \
    --model-path ./weights/imagenet/vit-base-patch16-224/ \
    --data-path /data/imagenet/ \
    --moe-idx 0 --num-shared-experts 1 \
    --num-routed-experts 7 --topk 2 --seed 0
```

Repeat the command with different seeds (e.g., `--seed 20`, `--seed 40`) to obtain multiple independently fine-tuned models.

> **Note**: Setting `--topk` equal to `--num-routed-experts` results in a dense MoE configuration.

---

## Expert Matching for Permutation Alignment (Optional but Recommended)

Due to the permutation invariance of experts in MoE models, aligning expert indices across different runs improves the accuracy and interpretability of LMC analysis. Use the following script to compute a matching between experts from two models:

```bash
CUDA_VISIBLE_DEVICES=0 python src/imagenet/expert_matching.py \
    --model-a weights/imagenet/finetune/idx0-lr0.0005-seed-0-shared0-routed16-topk2 \
    --model-b weights/imagenet/finetune/idx0-lr0.0005-seed-20-shared0-routed16-topk2 \
    --data-path /data/imagenet/
```
---
## Visualize the Linear Mode Connectivity Curves

To generate loss curve plots from the interpolation results between matched models, run:

```bash
python src/imagenet/plot.py \
    --file-1 ./results/imagenet/[idx0-lr0.0005-seed-0-shared0-routed16-topk2+idx0-lr0.0005-seed-20-shared0-routed16-topk2].json \
    --file-2 ./results/imagenet/[idx0-lr0.0005-seed-0-shared0-routed16-topk2+idx0-lr0.0005-seed-40-shared0-routed16-topk2].json \
    --file-3 ./results/imagenet/[idx0-lr0.0005-seed-20-shared0-routed16-topk2+idx0-lr0.0005-seed-40-shared0-routed16-topk2].json \
    --output-dir plots/imagenet/[idx0-lr0.0005-shared0-routed16-topk2] 
```
---

