# Quantization Program with GPTQ Triton Kernels

The entry point for quantization and evaluation is [main.py](./main.py).

The full list of command-line arguments is defined in [parse_args.py](./parse_args.py).

The Triton kernels are implemented in the [gptq_triton](./gptq_triton) folder.
The kernels are for Hessian accumulation, MSE grid selection, GPTQ error propagation, and min-pivot order.

The Triton kernels are auto-tuned at runtime and may result in slightly different floating-point rounding results.
Small numerical differences are expected even with the same random seed. 

## Dependencies

<p>
<a href="https://www.python.org/downloads/"><img alt="Python 3.14.4" src="https://img.shields.io/badge/Python-3.14.4-blue.svg"></a>
<a href="https://pytorch.org/get-started/"><img alt="PyTorch 2.11.0" src="https://img.shields.io/badge/PyTorch-2.11.0-orange.svg"></a>
<a href="https://huggingface.co/docs/transformers/installation"><img alt="Transformers 4.55.4" src="https://img.shields.io/badge/Transformers-4.55.4-yellow.svg"></a>
</p>

Please install the following dependencies.
```bash
pip install torch transformers==4.55.4 accelerate lm-eval wandb matplotlib ipykernel
```
or
```bash
pip install -r requirements.txt
```

## Example Commands

The example commands below show the basic usage for each supported method.

Please replace *CKPT* with the actual path to the Hugging Face model checkpoint folder.

**GPTQ**

```bash
python main.py \
    --model-dir=CKPT \
    --quant-group-size=128 \
    --quant-bit-width=4 \
    --quant-order=act \
    --do-rtn=False \
    --quant-use-entropy-mode=none \
    --quant-do-clip=True \
    --quant-use-mse=True \
    --seqlen=2048 \
    --data-train-n-samples=256 \
    --do-quant=True \
    --batch-size=1 \
    --data-seed=42 \
    --eval-openllm=False \
    --save-model=True
```

**HPTQ**

```bash
python main.py \
    --model-dir=CKPT \
    --quant-group-size=128 \
    --quant-bit-width=4.125 \
    --quant-order=act \
    --do-rtn=False \
    --quant-use-entropy-mode=strict_h \
    --quant-do-clip=False \
    --quant-use-mse=False \
    --seqlen=2048 \
    --data-train-n-samples=256 \
    --do-quant=True \
    --batch-size=1 \
    --data-seed=42 \
    --eval-openllm=False \
    --save-model=True
```

**SSQR**

```bash
python main.py \
    --model-dir=CKPT \
    --quant-group-size=128 \
    --quant-bit-width=4 \
    --quant-order=act \
    --do-rtn=False \
    --quant-use-entropy-mode=none \
    --quant-do-clip=False \
    --quant-use-mse=True \
    --seqlen=2048 \
    --data-train-n-samples=256 \
    --do-quant=True \
    --batch-size=1 \
    --data-seed=42 \
    --eval-openllm=False \
    --outlier-percentage=0.01 \
    --save-model=True
```
