Code for **Computation and Memory-Efficient Model Compression with Gradient Reweighting**

## Before pruning

- Download the model from meta official website
  - We have not supported the model on the huggingface yet, when pruning the Llama family, so you need to download the model from the official website from meta-ai.
- Download the dataset from huggingface 
  - C4 is used for training, WikiText2 and Ptb is used to test perplexity. These datasets are eaisly to download from the huggingface.
- Install requirements by using our requirements.txt

## Log on wandb

To record the training loss, sparsity and perplexity during pruning process, we support wandb.

```bash
export WANDB_API_KEY='your-api-key'
```

## Start pruning

We provide the llama2 and llama3 series pruning code, and it is very convenient to start pruning.

```bash
bash llama2-7b.sh
```

If you want to test other models, like llama3, just change the config path!

## Test zero shot performance

We use lm-evaluation-harness to test the zero shot performance. Please check its repository to install the dependency.

> [!Note]
> Since the lm-evaluation-harness only support the hf-format, we provide a convert.py, which from the official repository of llama, to convert the meta format model to a hf format model.