# Fake Quantization in SDLLM

Run the run.sh script to validate SDLLM PPL. Currently, it supports LLaMA2-7B, 13B, LLaMA3-8B, and Qwen2.5-14B. 
Please enable the corresponding os.environ['MODEL_VERSION'] with the available options: "MY_LLAMA" or "MY_QWEN".

We have configured spike fake quantization, which consists of two stages: floating-point to spike count, and spike count to ternary spike. 
To enable the second stage, set os.environ['SPIKE_ON'] = '1'.

Run the sp_train.sh script to perform sparse training with rotation. 
The quantile q needs to be configured inside the SrLlamaForCausalLM class.