# Code for CLUSTERGEN: TOKEN GENERATION IN SUBLINEAR TIME AND MEMORY WITH CLUSTERING KV CACHE
 

## LongEval experiment
- Implemented in ``run_longeval.py``
- E.g., ``CUDA_VISIBLE_DEVICES=0 python run_longeval.py --kv_cache_method sink --hh_size 1024 --recent_size 2048``

## LongBench experiment
- Implemented in ``run_longbench.py``
