# WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference

<div  align="center">    
<img src="figures/overview.png"  width="360" height="200" />
</div>

This repository is the Pytorch implementation of **WINA** (**W**eight **I**nformed **N**euron **A**ctivation). WINA is a novel, simple, and training-free sparse activation framework that jointly considers hidden state magnitudes and the column-wise ℓ2-norms of weight matrices. We show that this leads to a sparsification strategy that obtains optimal approximation error bounds with theoretical guarantees tighter than existing techniques. Empirically, WINA also outperforms state-of-the-art methods in average performance at the same sparsity levels, across a diverse set of LLM architectures and datasets. These results position WINA as a new performance frontier for training-free sparse activation in LLM inference, advancing training-free sparse activation methods and setting a robust baseline for efficient inference.

The current release supports:
* Llama-2-7B
* Llama-3-8B
* Mistral-7B
* Phi-4 (14B)

## Contents
- [Install](#Install)
- [Sparsity Allocation](#Sparsity-Allocation)
- [Evaluation](#Evaluation)
- [Reproduction](#Reproduction)

## Install
```bash
conda create -yn wina python=3.11
conda activate wina

pip install -r requirements.txt
```

## Sparsity Allocation
We assign sparsity for each weight matrices instead of a uniform sparsity across the model through the greedy algorithm proposed in TEAL.

💡Note: If you want to assign uniform sparsity for all matrices, you can **skip this part**.

1. Construct hidden states histograms. 
```bash
python wina/grab_acts.py --model_name [MODEL_PATH] --output_path [OUTPUT_PATH] --sparse_mode [SPARSE_MODE] --transform(Optional)
```
2. Allocate sparsity
```bash
python wina/greedyopt.py --model_name [MODEL_PATH] --output_path [OUTPUT_PATH] --sparse_mode [SPARSE_MODE] --model_type [MODEL_TYPE] --transform(Optional)
```
💡Notes: The '--transform' flag enables tensor transformation. Simply omit this flag if you wish to skip the transformation process.

ℹ️ Arguments
* [MODEL_PATH]: Path to base model, can be remote path or local path
* [OUTPUT_PATH]: Path to save outputs, including hidden states histograms and sparsity allocation results.
* [SPARSE_MODE]: Can be **WINA** or **TEAL**
* [MODEL_TYPE]: Can be one of the following four:
    * Llama-3-8B
    * Llama-2-7B
    * Mistral-7B
    * Phi-4-14B

## Evaluation
```bash
python eval.py --base_model [MODEL_PATH] --save_path [OUTPUT_PATH] --sparsity [sparsity] --sparse_mode [SPARSE_MODE] --greedy
```
💡Notes: The --greedy flag enables per-matrix sparsity assignment. Ensure [Sparsity Allocation](#sparsity-allocation) is completed before using this option.

## Reproduction
```bash
bash scripts/llama-2-7b.bash [MODEL_PATH]
bash scripts/llama-3-8b.bash [MODEL_PATH]
bash scripts/Mistral-7b.bash [MODEL_PATH]
bash scripts/phi_4_14b.bash [MODEL_PATH]
```
