# Reproducing the main results of ICLR2025 submission

Below are 4 experiment sets, with the commands used for evaluating the LLaMA models equipped with baseline attention (full, top-k) and our Top-Theta attention. These experiment sets will reproduce the main results of the ICLR submitted paper.

## 1. Prefill-only QA tasks (Hellaswag, ARC-C, ARC-E) - LLaMA2-7b, LLaMA-3-8B, LLaMA2-70b.

These tasks focus on QA datasets: HellasWag, ARC-C, ARC-E datasets. All Top-theta thresholds are calibrated on the same dataaset as evaluated (different split of course).

First run the experiments. Each run will be uniquely identified by a TIMESTAMP, its statistics will be dumped into a subdirectory `products/<TIMESTAMP>`, the acc_norm score will be written alongside the TIMESTAMP inside the results-Llama directory, in a text file. After you run the experiments - you can visualzie the acc_norm as a function of kept attention elements by using `notebooks/4-accuracy-kept_attn-kept_vrow-tradeoff_QA.ipynb` - just update the TIMESTAMPS of the corresponding runs that you want to plot.


Experiments:
```bash
# llama2-7b (hellaswag, arc_hallenge, arc_easy):
#   baseline (no sparsification)
python test_llama.py --llama 2-7 --task arc_challenge --mode 3 --timestamps
python test_llama.py --llama 2-7 --task arc_easy --mode 3 --timestamps
python test_llama.py --llama 2-7 --task hellaswag --mode 3 --timestamps
#   top-k
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc  
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
#   top-theta
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task arc_easy --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 2-7 --task hellaswag --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact

# llama3-8B (hellaswag, arc_hallenge, arc_easy):
#   baseline (no sparsification)
python test_llama.py --llama 3-8 --task arc_challenge --mode 3 --timestamps
python test_llama.py --llama 3-8 --task arc_easy --mode 3 --timestamps
python test_llama.py --llama 3-8 --task hellaswag --mode 3 --timestamps
#   top-k
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc  
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
#   top-theta
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task arc_easy --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8 --task hellaswag --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact

# llama 2-70b (hellaswag, arc_hallenge, arc_easy):
#   baseline (no sparsification)
python test_llama.py --llama 3-70 --task arc_challenge --mode 3 --timestamps
python test_llama.py --llama 3-70 --task arc_easy --mode 3 --timestamps
python test_llama.py --llama 3-70 --task hellaswag --mode 3 --timestamps
#   top-k
python test_llama.py --llama 3-70 --task arc_challenge --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task arc_challenge --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task arc_challenge --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task arc_challenge --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_challenge --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_challenge --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_easy --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task arc_easy --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task arc_easy --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task arc_easy --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_easy --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_easy --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task hellaswag --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task hellaswag --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task hellaswag --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-70 --task hellaswag --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task hellaswag --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-70 --task hellaswag --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
#   top-theta
python test_llama.py --llama 3-70 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-70 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-70 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-70 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_easy --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-70 --task arc_easy --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-70 --task arc_easy --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --llama 3-70 --task arc_easy --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_easy --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70 --task arc_easy --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70 --task hellaswag --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc   
python test_llama.py --llama 3-70 --task hellaswag --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc
python test_llama.py --llama 3-70 --task hellaswag --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc  
python test_llama.py --llama 3-70 --task hellaswag --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact 
python test_llama.py --llama 3-70 --task hellaswag --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact 
python test_llama.py --llama 3-70 --task hellaswag --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact 

# When done -> summarize gov_report benchmark (all 200 examples) results using notebooks/4-accuracy-kept_attn-kept_vrow-tradeoff_QA.ipynb

```



## 2. Prefill-only QA task (MedMCQA) - LLaMA-3-8B + threshold calibration on different dataset
This set of experiments focuses on MedMCQA dataset. First run the experiments. Each run will be uniquely identified by a TIMESTAMP, its statistics will be dumped into a subdirectory `products/<TIMESTAMP>`, the acc_norm score will be written alongside the TIMESTAMP inside the results-Llama directory, in a text file. After you run the experiments - you can visualzie the acc_norm as a function of kept attention elements by using `notebooks/4-accuracy-kept_attn-kept_vrow-tradeoff_QA.ipynb` - just update the TIMESTAMPS of the corresponding runs that you want to plot.

All Top-theta's thresholds are calibrated on a different dataset (arc_challenge).

Experiments:

```bash
# Step 1- Calibrate thresholds
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3-8 --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
# Step 2 - Run Baseline + Top-k
python test_llama.py --llama 3-8 --task medmcqa --mode 3 --timestamps
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax  --timestamps --vmc --sdc exact
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
python test_llama.py --llama 3-8 --task medmcqa --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --timestamps --vmc 
# Step 3 - modify scripts/move_thresholds-LLaMA-3-8B.sh - provide the correct timestamps according to the threshold configurations and run it to move thresholds of LLaMA-3-8B from "products" into "thresholds" directory
# Step 4 - Run Top-theta where the thresholds should be loaded from "thresholds" directory
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 			  --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-post-softmax_k512,512,16
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc               --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-post-softmax_k512,512,32
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc               --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-post-softmax_k512,512,64
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc               --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-post-softmax_k512,512,128
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc               --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-post-softmax_k512,512,256
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc               --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-post-softmax_k512,512,512
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact   --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-pre-softmax_k512,512,16
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact   --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-pre-softmax_k512,512,32
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact   --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-pre-softmax_k512,512,64
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact   --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-pre-softmax_k512,512,128
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact   --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-pre-softmax_k512,512,256
python test_llama.py --llama 3-8 --task medmcqa --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact   --calib_load_path thresholds/Llama-3-8B_arc_challenge_placement-pre-softmax_k512,512,512

####### When done -> summarize MEDMCQA benchmark results unsing 4-accuracy-kept_attn-kept_vrow-tradeoff_QA.ipynb

```



## 3. Generative task (Human-eval) - LLaMA-3-8B-Instruct, LLaMA-3-70B-Instruct 
This set of experiments focuses on the python code generation dataset of HumanEval.

For top-theta - 2 calibration approaches are evaluated
1. calibrate thresholds on same dataset
2. calibrate thresholds on different dataset)

To reproduce, first run the experiments. Each run will be uniquely identified by a TIMESTAMP, its statistics will be dumped into a subdirectory `products/<TIMESTAMP>`, the pass@1 score will be written alongside the TIMESTAMP inside the results-Llama directory, in a text file. After you run the experiments - you can visualzie the pass@1 as a function of kept attention elements and as a function of kept V-rows by using `notebooks/5-accuracy-kept_attn-kept_vrow-tradeoff_humaneval.ipynb` - just update the TIMESTAMPS of the corresponding runs that you want to plot.

Experiments:
```bash
########### LLaMA-3-8B-Instruct HumanEval ##############
# Step 1 - baselines (no sparsification), and topk
python gen_llama.py --timestamps --llama 3-8i --mode 3 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#   Top-theta with calibration on a humaneval itself
# Step 2 - Top-theta - with calibrating the thresholds on humaneval itself
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 32  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 64  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 128 --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 256 --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 32  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 64  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 128 --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 256 --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#   Top-theta with proper calibration on a different dataset from different domain (arc_challenge)
#   step 3 - just calibrate thresholds for Top-theta of Llama-3-8i
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 32  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 64  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 128 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 256 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 32  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 64  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 128 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-8i --task arc_challenge --mode 0 --k 256 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
#    step 4 - move the th.txt files to thresholds/<specially named directory> where the directoy is named according to (k, pre/post/model,dataset) on which it was calibrated

#    step 5 -run 3-8i using thresholds calibrated on arc_challenge
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 32  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-pre-softmax_k256,256,32
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 64  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-pre-softmax_k256,256,64
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 128 --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-pre-softmax_k256,256,128
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 256 --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-pre-softmax_k256,256,256
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 32  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-post-softmax_k256,256,32
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 64  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-post-softmax_k256,256,64
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 128 --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-post-softmax_k256,256,128
python gen_llama.py --timestamps --llama 3-8i --mode 0 --k 256 --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-8B-Instruct_arc_challenge_placement-post-softmax_k256,256,256


############ LLaMA-3-70B-Instruct HumanEval  ##########
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# Step 1 - baselines (no sparsification), and topk
python gen_llama.py --timestamps --llama 3-70i --mode 3 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 16  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 16  --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
python gen_llama.py --timestamps --llama 3-70i --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
#   Top-theta with proper calibration on a different domain
#   step 2 - just calibrate thresholds for Top-theta of Llama-3-8i
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 16  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 32  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 64  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 128 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 256 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calibrate_only --calib_tac --vmc 
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 16  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 32  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 64  --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 128 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
python test_llama.py --llama 3-70i --task arc_challenge --mode 0 --k 256 --layerk 0:256,1:256 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calibrate_only --calib_tac --vmc --sdc exact
#   step 3 - move the th.txt files to thresholds/<specially named directory> where the directoy is named according to (k, pre/post/model,dataset) on which it was calibrated

#   step 4 - run 3-70i using thresholds calibrated on arc_challenge
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 16  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-pre-softmax_k256,256,16
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 32  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-pre-softmax_k256,256,32
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 64  --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-pre-softmax_k256,256,64
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 128 --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-pre-softmax_k256,256,128
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 256 --layerk 0:256,1:256 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-pre-softmax_k256,256,256
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 16  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-post-softmax_k256,256,16
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 32  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-post-softmax_k256,256,32
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 64  --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-post-softmax_k256,256,64
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 128 --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-post-softmax_k256,256,128
python gen_llama.py --timestamps --llama 3-70i --mode 0 --k 256 --layerk 0:256,1:256 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --num_samples_per_task 1 --max_seq_len 2048 --prompt_prefix "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert programmer that helps to complete Python code based on the initial function name and docstring from the user initial.<|eot_id|><|start_header_id|>user<|end_header_id|>Complete the following Python function according to its docstring. After the function code is complete do not write any following tests or invocations of this function, do not repeat the implementation again. Finish your text after you finish the function code.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --calib_load_path thresholds/Llama-3-70B-Instruct_arc_challenge_placement-post-softmax_k256,256,256

# When done -> summarize humaneval benchmark results using notebooks/4-accuracy-kept_attn-kept_vrow-tradeoff_humaneval.ipynb

```



## 4. Generative task (LongBench) - llama-3.1-8B-Instruct
This set of experiments focuses on LongBench dataset. We used its **qmsum** and **gov_report** tasks, each having 200 examples. Note that for the qmsum dataset we experimented with just the first 20/200 examples, then proceeded to the full 200.

To reproduce, first run the experiments. Each run will be uniquely identified by a TIMESTAMP, its statistics will be dumped into a subdirectory `products/<TIMESTAMP>`, the rouge score will be written alongside the TIMESTAMP inside the results-Llama directory, in a text file. After you run the experiments - you can visualzie the rouge score as a function of kept attention elements and as a function of kept V-rows by using `notebooks/6-accuracy-kept_attn-kept_vrow-tradeoff_longbench.ipynb` - just update the TIMESTAMPS of the corresponding runs that you want to plot.

All Top-theta calibrations are done on a different dataset (arc-challenge)

Experiments:
```bash
# step 1 - calibrate-only the 3.1-8i model
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 16  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 32  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 64  --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 128 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 256 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 512 --layerk 0:512,1:512 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact

# Step 2 - Manually modify scripts/move_thresholds-LLaMA-3.1-8B-Instruct.sh - provide the correct timestamps according to the threshold configurations and run it to move thresholds of LLaMA-3.1-8B-Instruct from "products" into "thresholds" directory

# Step 3 - Run qmsum - quick 20-examples run - (Baseline, Top-k, Top-Theta with preloading of thresholds)
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 3 --timestamps  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 32  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps --num_tasks 20 
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 64  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps --num_tasks 20 
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps --num_tasks 20 
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps --num_tasks 20 
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps    --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 32  --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps    --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 64  --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps    --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps    --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps    --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,512  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 32  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,32  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 64  --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,64  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,128  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,256  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,512  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 32  --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,32  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 64  --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,64  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,128  --num_tasks 20
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,256  --num_tasks 20
# When done -> summarize qmsum benchmark (20 examples) results using notebooks/6-accuracy-kept_attn-kept_vrow-tradeoff_longbench.ipynb

#### Run qmsum - all 200-examples - (Baseline, Top-k, Top-Theta with preloading of thresholds)
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 756 --layerk 0:756,1:756 --calib_add_sigma 0.1 --placement post-softmax --timestamps --calib_tac --vmc 
python test_llama.py --calibrate_only --llama 3.1-8i --task arc_challenge --mode 0 --k 756 --layerk 0:756,1:756 --calib_add_sigma 0.1 --placement pre-softmax  --timestamps --calib_tac --vmc --sdc exact
pushd scripts && source move_thresholds-LLaMA-3.1-8B-Instruct.sh && popd
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 3 --timestamps
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps  
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps  
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps  
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 756 --layerk 0:756,1:756 --placement post-softmax --vmc --timestamps
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 1 --k 756 --layerk 0:756,1:756 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,128
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,256
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,512
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 756 --layerk 0:756,1:756 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k756,756,756
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,128
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,256
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,512
python gen_llama.py --dataset longbench_qmsum --llama 3.1-8i --mode 0 --k 756 --layerk 0:756,1:756 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k756,756,756
# When done -> summarize qmsum benchmark (all 200 examples) results using notebooks/6-accuracy-kept_attn-kept_vrow-tradeoff_longbench.ipynb

#### Run gov_report - all 200-examples - (Baseline, Top-k, Top-Theta with preloading of thresholds)
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 3 --timestamps
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps  
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps  
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --timestamps  
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 756 --layerk 0:756,1:756 --placement post-softmax --vmc --timestamps
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 1 --k 756 --layerk 0:756,1:756 --placement pre-softmax --sdc exact --vmc --timestamps
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 128 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,128
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 256 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,256
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 512 --layerk 0:512,1:512 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k512,512,512
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 756 --layerk 0:756,1:756 --placement post-softmax --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-post-softmax_k756,756,756
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 128 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,128
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 256 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,256
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 512 --layerk 0:512,1:512 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k512,512,512
python gen_llama.py --dataset longbench_gov_report --llama 3.1-8i --mode 0 --k 756 --layerk 0:756,1:756 --placement pre-softmax --sdc exact --vmc --calib_add_sigma 0.1 --calib_sample_frac 1.0 --calib_tac --timestamps  --calib_load_path thresholds/Llama-3.1-8B-Instruct_arc_challenge_placement-pre-softmax_k756,756,756

# When done -> summarize gov_report benchmark (all 200 examples) results using notebooks/6-accuracy-kept_attn-kept_vrow-tradeoff_longbench.ipynb


```


