# Quantisation benchmarking

```sh
python src/qbench.py
python src/qmodels.py

# Long runs
python src/qbench.py --exclude '' -b 16 8 4 2 1 -k 8192 6144 4096 3072
python src/qmodels.py --model custom-llama-4B custom-llama-12B custom-llama-31B --batch-size 1 4 16 64 256 --kernel triton marlin-lut marlin torch.compile
```

First-time setup:

```sh
sudo apt install ninja-build pybind11-dev
uv sync --extra dev
echo 'export PYTHONPATH=$(dirname ${VIRTUAL_ENV})/src' >> .venv/bin/activate
```

## Credits

Includes code adapted from [IST-DASLab/marlin](https://github.com/IST-DASLab/marlin).
