<h1 align="center">
    Quantization-Aware Sparsification (QAS)
</h1>

<p align="center">
    <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/license-MIT-blue"></a>
</p>

<p align="center">
  <a href="#background">Background</a> •
  <a href="#usage">Usage</a> •
  <a href="#code">Code</a> •
  <a href="#acknowledgements">Acknowledgements</a> •
  <a href="#citation">Citation</a> •
  <a href="#license">License</a>
</p>

## Background
We introduce Quantization-Aware Sparsification (QAS), a novel compression framework that sparsifies accounting for prior quantization. We provide empirical experiments in which QAS performs comparably to $\mathbf{S} \to \mathbf{Q}$. The core ideas is that during sparsification, we compute the pruning mask based on the original weights rather than the quantized ones, but we still apply the mask to the quantized weights. This allows us to avoid the suboptimal collisions associated with $\mathbf{Q} \to \mathbf{S}$.

## Usage

### Installing

1. Recursively clone the repository.

2. Create a `cache` folder in the root directory.
```bash
mkdir cache
```

3. Download the project dependencies.
```bash
pip install -r requirements.txt
```

4. Clone AWQ. 
```bash
git clone https://github.com/mit-han-lab/llm-awq
```

5. Follow the AWQ installation instructions.

## Code

The ```scripts``` folder includes scripts used to run experiments for the paper. The ```model``` folder contains all the code used to collect models from HuggingFace. The ```sparsity``` folder contains the traditional magnitude-based sparsification algorithm, as well as the newly proposed QAS.

### `baseline.sh`
`baseline.sh` provides baseline results for combining AWQ quantization with magnitude-based sparsification. It can be run using the following command.
```bash
./baseline.sh --model <model_name> --order <quant-first|sparsity-first> --widths <w1 w2 ...> --sparsities <s1 s2 ...>
```

### `qas.sh`
`qas.sh` provides results for combining AWQ quantization with our newly proposed QAS sparsification scheme. It can be run using the following command.
```bash
./qas.sh --model <model_name> --widths <w1 w2 ...> --sparsities <s1 s2 ...>
```

## Acknowledgements

## Citation

## License

<a href="LICENSE">MIT License</a>
