# 💿 [CDPruer] Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

*A training-free visual token pruning method for MLLM inference acceleration by maximizing the conditional diversity of retained tokens.*

## 🎞️ Background

Attention-based methods retain numerous duplicate tokens, failing to achieve effective visual token compression. Similarity-based methods neglect user instructions, always pruning the same tokens and paying insufficient attention to regions most relevant to the question. CDPruner considers the conditional diversity of the selected subset, dynamically adjusting pruning according to the user instructions and retaining maximal visual information.

![comparison](assets/comparsion.png)

## 👁️ Overview

CDPruner first calculate the similarity between visual tokens conditioned on their relevance to the current instruction. Then, CDPruner uses a DPP to select the subset to keep. As a training-free and model-agnostic method, it ensures both the diversity and quality of the selected token subset, significantly reducing computational cost while maintaining considerable performance.

![design](assets/design.png)

## ⚙️ Setup

### 🏝️ Environment

Install necessary packages.
```bash
conda create -n cdpruner python=3.10 -y
conda activate cdpruner
pip install -e .
```

(Optional) Install FlashAttention for further inference acceleration.
```bash
pip install flash-attn --no-build-isolation
```

### 📦️ Model

Download corresponding [LLaVA](https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md) checkpoints from [Hugging Face](https://huggingface.co/liuhaotian) 🤗:

| Version | LLM | Checkpoint |
|----------|:----------:|:-----------:|
| LLaVA-1.5 | Vicuna-7B | [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) |
| LLaVA-1.5 | Vicuna-13B | [liuhaotian/llava-v1.5-13b](https://huggingface.co/liuhaotian/llava-v1.5-13b) |
| LLaVA-1.6 (LLaVA-NeXT) | Vicuna-7B | [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) |
| LLaVA-1.6 (LLaVA-NeXT) | Vicuna-13B | [liuhaotian/llava-v1.6-vicuna-13b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-13b) |

### 📊 Data

Download each dataset according to [EVAL.md](EVAL.md).

## 📋️ Evaluation

Evaluate CDPruner on LLaVA-1.5-7B with 64 tokens retained using the GQA benchmark:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/7b/gqa.sh 64
```

Evaluate CDPruner on LLaVA-NeXT-13B with 320 tokens retained using the MME benchmark:
```bash
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_6/13b/mme.sh 64
```

## 🎟️ License

This project is released under the [Apache 2.0 license](LICENSE).
