# Towards Efficient Post-Training Quantization For Large Vision-Language Models Via Token-Wise Redundancy Elimination

## Install

1. Clone this repository and navigate to Open-VLMQ folder
```bash
git clone --recurse-submodules this/repo/Open-VLMQ.git
```

2. Download datasets

You can refer to [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) for data preparation.

3. Install third-party repositories and packages
```bash
# third-party repositories
pip install -e ./third-party/LLaVA-NeXT
pip install -e ./third-party/lmms-eval
# packages
pip install -r requirements.txt
pip install fast-hadamard-transform==1.0.4.post1
pip install flash-attn --no-build-isolation
# install VLMQ
pip install -e .
```

## Quick Start

1. Run quantization (currently supports GPTQ/GPTAQ/VLMQ)

Here is an example of running the VLMQ on the Qwen2-VL series and saving the fake quantized model weights.
```bash
MODEL=qwen2_vl
PRETRAINED=/your/model/path

python -m vlmq.w_only_quantize \
    --model $MODEL \
    --model_args=pretrained=$PRETRAINED,max_pixels=2359296,use_flash_attention_2=True \
    --batch_size 1 \
    --method vlmq \
    --percdamp 0.01 \
    --act_order \
    --n_samples 512 \
    --seqlen 512 \
    --w_bits 3 \
    --w_groupsize -1 \
    --w_clip \
    --a_bits 16 \
    --v_bits 16 \
    --k_bits 16 \
    --k_asym \
    --v_asym \
    --w_asym \
    --a_asym \
    --a_clip_ratio 0.9 \
    --k_clip_ratio 0.95 \
    --v_clip_ratio 0.95 \
    --grad_from attn_out \
    --grad_acton qkvo \
    --grad_norm l1 \
    --grad_clip \
```

2. Run evaluation
```bash
TASKS=docvqa_val
interleave_visuals=False

PRETRAINED_LIST=(
    "your/model/path1"
    "your/model/path2"
)

for PRETRAINED in "${PRETRAINED_LIST[@]}"; do
    MODEL=qwen2_vl
    LOG_SUFFIX=reproduce
    OUTPUT_PATH="${PRETRAINED}/logs"

    accelerate launch --num_processes=$num_processes --main_process_port=12345 -m vlmq.lmms_eval_entry \
        --model $MODEL \
        --model_args=pretrained=$PRETRAINED,max_pixels=2359296,use_flash_attention_2=True \
        --tasks $TASKS \
        --batch_size 1 \
        --log_samples --log_samples_suffix $LOG_SUFFIX \
        --output_path $OUTPUT_PATH
done
```
