# Block Structure Analysis of Sparse Activations in LLMs

This repository contains the code used to produce the experiments and visualizations in our paper.

The requirements are shown in requirements.txt file
Install dependencies via:

```bash
pip install -r requirements.txt

```

Our repository has two dependencies that need to be downloaded additionally. They are github repositories corresponding to TEAL and MoEfication. They must be installed within the current project folder.
```bash
git clone https://github.com/FasterDecoding/TEAL.git
git clone https://github.com/thunlp/MoEfication.git
```

Excess Risk Experiments:The code models a scenario comparing the excess risk of 'Sparse Estimators' to that of a 'Dense Estimator'. This comparison is analyzed as a function of sample size n. The model parameters are set to a dimensionality p=900, k=12 experts, with the assumption that all inputs share the same size.

Use GeneralizationErrorComputation.ipynb to reproduce results.

Estimation of CLosed form of excess risks: Run Lasso to get an estimation of the closed form of the excess risks for the sparse estimator and dense estimator, under a simplified case that d=k=100

Use Lasso_fit.ipynb to reproduce corresponding results.

Modular Structure: To reproduce visualizations of modular structure follow the steps.

Clone the TEAL repository in the current folder and install it
```bash
git clone https://github.com/FasterDecoding/TEAL.git
pip install -e ./TEAL
```

Also download the models to generate modular structure over from hugginface. 
The modular structures for a given LLM model can be generated by:
```bash
python find_block_structure.py \
--model_dir <root_dir of LLM model> \
--seed <SEED> \
--teal_path ./TEAL/models/<MODEL NAME> \ 
--cur_dir <PATH TO THE CURRENT PROJECT DIRECTORY> \
--teal_sparsity_type greedy \
--layer_name up_proj \
--layers <id of decoder layers to generate modular structure on>
```

Parameter descriptions:
--model_dir: The root directory where the huggingface repository of Llama-2-7B-hf/Llama-3.1-8B models are saved on local machine
--seed: The seed to initialize rng with
--teal_path: Path to the folder within TEAL directory corresponding to the Llama-2-7B-hf/Llama-3.1-8B model where statistics for thresholding are saved for greedy and uniform thresholding.
--cur_dir: Path to the current project directory. Can be '.'
--teal_sparsity_type: greedy/uniform to apply the correspondng thresholding for activation sparsities
--layer_name: Name of the layer on whose input activations modular structure is to be calculated. Can be 'up_proj','gate_proj', etc.
--layers: List of layers to generate modular structure on. Can be 0 1 2 3 ... . If nothing is specified, generates block structures over all decoder layers.

Robustness to Noise experiments: To reproduce results on noise and confidence intervals, follow these steps.

Clone the MoEfication repository into the current folder
```bash
git clone https://github.com/thunlp/MoEfication.git
```

Then follow steps to generate an MoEfied T5-base model

```bash
python ./MoEfication/examples/t5_cluster_example.py
python ./MoEfication/examples/t5-sst2-gt.py
python ./MoEfication/examples/t5-sst2-inf.py
python ./MoEfication/examples/t5_select_example.py 
python ./MoEfication/examples/t5-sst2-mlp.py
```

We now have the feature splits for 96 experts that can be used to impose activation sparsity
Now move the files calc_baseline.py and calc_noise_robustness.py into the MoEfication directory

```bash
mv calc_baseline.py ./MoEfication/
mv calc_noise_robustness.py ./MoEfication/
```

Now run them:
For dense baselines:

```bash
python ./MoEfication/calc_baseline.py
```

For sparse baselines over a particular seed:

```bash
python ./MoEfication/calc_noise_robustness.py --seed <SEED>
```
For confidence intervals visualization repeat this over seeds 0 to 99.

The above experiments will generate files that aggregate the accuracy. The confidence intervals can be visualized by following RobustnessToNoiseVisualizations.ipynb

The train_scratch.py is used for training a small MoE model and a dense model that has the same parameter size for the FFN from scratch, to compare the generalization performance and convergence speed.

The moe_robust_main.py is used for the experiment on probing-based MoE experiments.
