
## Code for ''HSR-Enhanced Sparse Attention Acceleration''

### 1 Environment Setup

Install the dependencies by running the following command:

``` Bash
pip install -r requirements.txt
```

Remark: We run our experiments on one $80$ GB GPU. 


### 2 Evaluation

Run the following command to evaluate the perplexity of three mainstream LLMs performance on `PaulGrahamEssays` with top-$r$ indices. Here, we choose $r$ from $\{2^2, 2^4, 2^6, 2^8, 2^{10}, 2^{12}, 2^{15}\}$. 

``` Bash
# Llama-3.1-8B-Instruct
python perplexity_eval.py --modified knn --model meta-llama/Llama-3.1-8B-Instruct
# Mistral-Nemo-Instruct-2407
python perplexity_eval.py --modified knn --model mistralai/Mistral-Nemo-Instruct-2407
# Phi-3.5-mini-instruct
python perplexity_eval.py --modified knn --model microsoft/Phi-3.5-mini-instruct
```

### 3 Visualization

#### 3.1 $\mathsf{ReLU}^\alpha$ and $\exp$ visualization

Please refer to `draw_figs\draw_relu_exp.ipynb` for details of drawing the $\mathsf{ReLU}^\alpha$ and $\exp$ figure. (Figure 1 in the paper)

<div align=center><img src="draw_figs\figs\exp_relu_power.png"></div>

#### 3.2 Perplexity evaluation results visualization

Please refer to `draw_figs\draw_perplexity.ipynb` for details of drawing the perplexty figure. (Figure 2 in the paper)

<div align=center><img src="draw_figs\figs\perplexity.png"></div>






