To run in Google Colab, create folders 

sample_data-> HeadKV -> data -> PaulGrahamEssays

Upload the essays to PaulGrahamEssays
Upload files run_needle_in_haystack.py, llama_simple.py to folder HeadKV

# install dependencies
!pip install datasets
!pip install minference
!pip install fuzzywuzzy
!pip install rouge
!pip install flash_attn

# login to hugginface account using access token, make sure you have access to llama and mistral models
!huggingface-cli login

!pip install tiktoken

# install dependencies
!pip install tiktoken rouge_score
!pip install transformers==4.46.0
!pip install git+https://github.com/microsoft/MInference.git

#execute this line to run BalanceKV 
!python sample_data/HeadKV/run_needle_in_haystack.py --kv_type "weightedbw"