# Training-Free Acivation Sparsity in Large Language Models


TEAL induces up to 40-50% model-wide activation sparsity in modern LLMs with minimal degradation, resulting in an up to 1.53-1.8x speedup in single-batch decoding.

## Contents

- [Install](#Install)
- [Demo](#Demo)
- [Inference Usage](#Inference-Usage)
- [Accuracy Usage](#Accuracy-Usage)

## Install

1. Navigate to TEAL:

```
cd TEAL
```

2. Set up environment:


```bash
conda create -yn teal python=3.11
conda activate teal

pip install -e .
```

3. (Optional) If you want to calibrate thresholds for your own models, or run accuracy evals for models, install the following dependency:

  ```bash
  pip install -e ".[eval]"
  ```

## Inference Usage

For easy usage, we provide calibrated thresholds for Llama-2/3 and Mistral models in `models/` folder.

1. Navigate to gpt-fast:

```bash
cd gpt-fast
```

2. Download model weights and convert to gpt-fast format (`scripts/prepare.sh`):
```bash
python scripts/download.py --repo_id meta-llama/Llama-2-7b-hf --path $SAVE_PATH && python scripts/convert_hf_checkpoint.py --checkpoint_dir $SAVE_PATH/meta-llama/Llama-2-7b-hf
```

3. Run dense inference (`scripts/base_run.sh`):

```bash
CUDA_VISIBLE_DEVICES=0 python generate.py \
    --compile \ 
    --checkpoint_path $SAVE_PATH/meta-llama/Llama-2-7b-hf/model.pth \ 
    --interactive
```

4. Run sparse inference! (`scripts/run.sh`):
```bash
CUDA_VISIBLE_DEVICES=0 python generate.py \
    --compile \ 
    --checkpoint_path $SAVE_PATH/meta-llama/Llama-2-7b-hf/model.pth \ 
    --hist_path ../models/Llama-2-7B/histograms \ 
    --sparsity 0.5 \ 
    --interactive
```

### Accuracy Usage

1. Navigate to TEAL:
```bash
cd TEAL
```

1. Construct histograms for threshold calibration (`scripts/grab_acts.bash`):

```bash
CUDA_VISIBLE_DEVICES=0 python teal/grab_acts.py \  
  --model_name meta-llama/Llama-2-7b-hf \ 
  --output_path $OUTPUT_PATH
```

2. Run perplexity test (`scripts/ppl_test.bash`):

```bash
CUDA_VISIBLE_DEVICES=0 python teal/ppl_test.py \
--model_name meta-llama/Llama-2-7b-hf \
--teal_path $OUTPUT_PATH \
--sparsity 0.5
```

3. (Optional) Run block-wise greedy optimization (`scripts/greedyopt.bash`):

```bash
CUDA_VISIBLE_DEVICES=0 python teal/greedyopt.py \
  --model_name meta-llama/Llama-2-7b-hf \
  --model_type Llama-2-7B \
  --teal_path $OUTPUT_PATH \
  --target_sparsity 0.9 \
  --base_step_size 0.05 \
  --last_fraction 0.25
```

```bash
CUDA_VISIBLE_DEVICES=0 python teal/ppl_test.py \
  --model_name meta-llama/Llama-2-7b-hf \
  --teal_path $OUTPUT_PATH \
  --sparsity 0.5 \
  --greedy_flag
```