# QERA: Analytical Solution to Quantization Error Reconstruction


## Env Setup

```bash
conda env create -f environment.yml
conda run -n loqer python -m pip install -r requirements.txt
```

## Experiments

1. For QPEFT experiments of QLoRA, LoftQ, QERA on GLUE, refer to `experiments/qpeft/glue`
2. For QPEFT experiments of QLoRA, LoftQ, QERA on SlimPajama, refer to `experiments/qpeft/slimpajama`
3. For QPEFT experiments of QLoRA, LoftQ, QERA on GSM8K, refer to `experiments/qpeft/gsm8k`
4. For PTQ experiments of ZeroQuantV2, LQER, QERA, refer to `experiments/ptq`
5. For all figures in the paper, refer to `experiments/plots`

## EntryPoint Usage

1. MXINT4 weight quantization only, no loqer

    ```bash
    python ptq_pipeline.py ./experiments/configs/w-only-uniform-rank.yaml --disable-loqer --disable-perplexity-eval
    ```

    The config template `w-only-uniform-rank.yaml` runs TinyLlama on a subset of SlimPajama for calibration, and WikiText2 for perplexity evaluation.

2. MXINT4 weight, scale = identity matrix (ZeroQuantV2)

    ```bash
    python ptq_pipeline.py ./experiments/configs/w-s-uniform-rank.yaml --loqer-scaling-mode identity --disable-perplexity-eval
    ```

    "scale = idenity matrix" means that we just apply SVD to the quantization error: $\mathrm{SVD}(W - W_q)$.

3. MXINT weight, scale = activation induced diagonal matrix, which is derived by assuming $E[x_i x_j] = 0$ for $i\neq j$.

    ```bash
    python ptq_pipeline.py ./experiments/configs/w-s-activation-rank.yaml --loqer-scaling-mode diag --disable-perplexity-eval
    ```

4. MXINT weight, scale = auto-correlation matrix of activation vectors, which is derived without the assumption.

    ```bash
    python ptq_pipeline.py ./experiments/configs/w-s-activation-rank.yaml --loqer-scaling-mode rxx --disable-perplexity-eval
    ```



