# Linear Mode Connectivity on AG News

This guide outlines the full pipeline for training, fine-tuning, and evaluating **Linear Mode Connectivity (LMC)** for Mixture-of-Experts (MoE) models using the **AG News** dataset.

---

## Step 1: Download and Prepare the Dataset

Run the following script to download and preprocess the dataset:

```bash
bash src/agnews/data.sh
```

---

## Step 2: Train the Base Transformer Model

Train a baseline transformer model on the AG News dataset:

```bash
CUDA_VISIBLE_DEVICES=0 python src/agnews/train_model.py \
    --data-path ./data/agnews/ \
    --save-dir ./weights/agnews/pretrained
```

---

## Step 3: Fine-tune with a Mixture-of-Experts (MoE) Layer

Inject and fine-tune a Mixture-of-Experts layer:

```bash
CUDA_VISIBLE_DEVICES=0 python src/agnews/finetune_moe.py \
    --model-path weights/agnews/pretrained/transformer-layers-1.flax \
    --data-path ./data/agnews/ \
    --save-dir ./weights/agnews/finetune \
    --num-experts 1 \
    --num-shared-experts 0 \
    --num-gated-experts 1 \
    --moe-idx 0 \
    --topk 0 \
    --seed 0
```

Repeat this step with a different random seed (e.g., `--seed 20`) to generate another fine-tuned model for interpolation.

---

## Step 4: Naive Linear Interpolation

Perform naive interpolation between two fine-tuned MoE models:

```bash
CUDA_VISIBLE_DEVICES=0 python src/agnews/naive_interpolate.py \
    --model-a weights/agnews/finetune/idx-0-shared-0-gated-1-topk-0-seed-0.flax \
    --model-b weights/agnews/finetune/idx-0-shared-0-gated-1-topk-0-seed-20.flax \
    --data-path ./data/agnews/
```

---

## Step 5: Calculate Loss Barrier

```bash
python src/agnews/loss_barrier.py \
    --file-path results/agnews/[idx-0-shared-0-gated-1-topk-0-seed-0.flax+idx-0-shared-0-gated-1-topk-0-seed-20.flax].json
```

Replace the placeholder with the actual path to the generated result.

---
