# SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference

This repository implements the method proposed in the paper **"SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference"**, which improves the efficiency of large language model (LLM) inference by leveraging a small auxiliary model to assist key-value cache compression.

---

## 🚀 Getting Started

### 1. Create and Activate Conda Environment

To set up the required dependencies, run:

```bash
conda env create -f environment.yml
conda activate smallkv
```

This will install all necessary packages and libraries required to run the experiments.

---

## 📊 Running Evaluation Tasks

### 2. Evaluate on GSM8K Dataset

Run the following command to test the model on the **GSM8K** dataset:

```bash
python gsm8k_test.py \
    --big_model_path qwen/Qwen2-7B \
    --small_model_path qwen/Qwen2-0___5B \
    --gsm8k_path data/gsm8k \
    --prompt_file prompt_hardest.txt \
    --heavy_budget_ratio 0.2 \
    --recent_budget_ratio 0.2 \
    --compensate_budget_ratio 0.2 \
    --model_series qwen
```

#### 🔍 Parameter Explanation:

| Parameter | Description |
|----------|-------------|
| `--big_model_path` | Path or name of the large model (e.g., Qwen2-7B) |
| `--small_model_path` | Path or name of the small assisting model (e.g., Qwen2-0.5B) |
| `--gsm8k_path` | Path to the GSM8K dataset directory |
| `--prompt_file` | File containing the prompt template used for input formatting |
| `--heavy_budget_ratio` | Ratio of attention keys/values kept based on critical tokens  |
| `--recent_budget_ratio` | Ratio of attention keys/values kept based on recent tokens |
| `--compensate_budget_ratio` | Ratio of attention keys/values allocated for marginal tokens (See paper for more details)|
| `--model_series` | Specifies the model series (e.g., `qwen` and `llama`) |

---

### 3. Evaluate on BBH Dataset

Run the following command to test the model on the **BBH** dataset:

```bash
python bbh_test.py \
    --big_model_path qwen/Qwen2-7B-Instruct \
    --small_model_path qwen/Qwen2-0___5B \
    --bbh_path data/bbh \
    --prompt_path lib_prompt \
    --heavy_budget_ratio 0.2 \
    --recent_budget_ratio 0.2 \
    --compensate_budget_ratio 0.2 \
    --model_series qwen
```

#### 🔍 Parameter Explanation:

The parameters are identical to those used in the GSM8K evaluation, with the exception that:

- `--bbh_path` specifies the path to the BBH dataset directory.
- `--prompt_path` indicates the path to the prompt lib of BBH dataset.





