# Double Sparsity

## Install

~~~
git clone https://github.com/andy-yang-1/DoubleSparse.git
cd DoubleSparse
conda create -n sparse python=3.9 -y
conda activate sparse
pip install -r requirement.txt

# no offloading / no compile
pip3 install torch torchvision torchaudio

# no offloading / compile
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

# offloading for a10g
pip3 install torch==2.1
pip install  dgl -f https://data.dgl.ai/wheels/torch-2.1/cu121/repo.html
~~~

## Weight

~~~
# normal weight
git clone https://github.com/pytorch-labs/gpt-fast.git
cd gpt-fast
export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
./scripts/prepare.sh $MODEL_REPO

# offloading weight
cd path/to/DoubleSparse
python3 offloading/scripts/convert_hf_checkpoint.py --checkpoint_dir ~/gpt-fast/checkpoints/meta-llama/Llama-2-7b-chat-hf --model_name meta-llama/Llama-2-7b-chat-hf
~~~


## Run

~~~
# normal
cd models
python3 generate --checkpoint_path ~/gpt-fast/checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --max_new_tokens 2048 --batch_size 4

# offloading
cd offloading
python3 generate --checkpoint_path ~/gpt-fast/checkpoints/meta-llama/Llama-2-7b-chat-hf/model_offloading.pth --max_new_tokens 2048 --batch_size 16
~~~