# ManifoldKV: Geometry-Driven KV Cache Compression

**ICML 2026 Submission - Complete Reproducibility Package**

This repository contains the minimal code changes on top of [kvpress](https://github.com/NVIDIA/kvpress) to reproduce all results from the paper "ManifoldKV: Geometry-Driven KV Cache Compression".

## Table of Contents

1. [Quick Start](#quick-start)
2. [Key Innovation](#key-innovation)
3. [Installation](#installation)
4. [Paper Claims & Reproduction Commands](#paper-claims--reproduction-commands)
5. [Expected Results](#expected-results)
6. [File Structure](#file-structure)
7. [Detailed Reproduction Guide](#detailed-reproduction-guide)

---

## Quick Start

```bash
# 1. Install dependencies
pip install -e .
pip install flash-attn --no-build-isolation

# 2. Reproduce main result (Table 1: ManifoldKV achieves 95.73%)
python scripts/run_ruler_main.py --method adakv_manifold_kv --context 4096

# 3. Reproduce 64K recovery (Table 2: 84.3% from 35.2%)
python scripts/run_64k_windowed.py
```

---

## Key Innovation

ManifoldKV replaces cosine similarity (used by KeyDiff) with **L2 (Euclidean) distance** for scoring token importance:

```python
# KeyDiff (cosine similarity - misses magnitude)
anchor = F.normalize(keys, dim=-1).mean(dim=2, keepdim=True)
scores = -F.cosine_similarity(keys, anchor, dim=-1)

# ManifoldKV (L2 distance - captures direction + magnitude)
mu = keys.mean(dim=2, keepdim=True)
scores = torch.norm(keys - mu, dim=-1)  # <-- 3 lines of core code
```

This simple change yields:
- **+40 points** improvement on RULER (92.7% vs 52.8% standalone)
- **+15 points** on multi-key retrieval (directional collision prevention)
- **+49 points** recovery at 64K with windowed centroids

---

## Installation

### Requirements
- Python 3.10+
- CUDA 12.0+ (for Flash Attention 2)
- GPU with 24GB+ VRAM (for Llama-3.1-8B-Instruct)
- For 64K experiments: 80GB+ VRAM (H100/A100/B200)

### Setup

```bash
# Clone and install
git clone <this-repo>
cd icml_code_repo

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install kvpress with ManifoldKV
pip install -e .
pip install flash-attn --no-build-isolation
pip install datasets transformers torch fire tqdm pandas numpy pyyaml scipy scikit-learn matplotlib

# Verify installation
python -c "from kvpress.presses.manifold_press import ManifoldKVPress; print('ManifoldKV installed!')"

# Login to HuggingFace (for Llama models)
huggingface-cli login
```

---

## Paper Claims & Reproduction Commands

### Claim 1: ManifoldKV achieves SOTA on RULER (Table 1)

**Claim**: ManifoldKV achieves 95.73% on RULER at 4K-16K context.

```bash
# Reproduce main results
cd scripts
./run_ruler_experiments.sh

# Or run individually:
CUDA_VISIBLE_DEVICES=0 python ../evaluation/evaluate.py \
    --dataset ruler \
    --data_dir 4096 \
    --model meta-llama/Meta-Llama-3.1-8B-Instruct \
    --press_name adakv_manifold_kv \
    --compression_ratio 0.20 \
    --output_dir ../results
```

**Expected**: 95.73% (ManifoldKV) vs 95.66% (KeyDiff) vs 83.97% (SnapKV)

---

### Claim 2: 49-Point Recovery at 64K (Table 2)

**Claim**: WindowedManifoldKV recovers from 35.2% to 84.3% at 64K context.

```bash
# Run 64K benchmark with all variants
python scripts/run_64k_windowed.py

# Or run specific methods:
python scripts/run_64k_windowed.py --method windowed_4k --window_size 4096
python scripts/run_64k_windowed.py --method global  # Should get ~35%
```

**Expected**:
| Method | Accuracy | Recovery |
|--------|----------|----------|
| Global ManifoldKV | 35.2% | Baseline (centroid dilution) |
| Windowed-4K | **84.3%** | **+49.1 points** |
| Windowed-8K | 83.9% | +48.7 points |
| KeyDiff | 81.1% | N/A |

---

### Claim 3: +15 Points on Multi-Key Retrieval (Table 4)

**Claim**: ManifoldKV outperforms KeyDiff by +15.4 on niah_multikey_3 at 50% compression.

```bash
# Run multi-key experiments
python scripts/run_multikey_ablation.py

# Specific task
python scripts/run_ruler_main.py \
    --method adakv_manifold_kv \
    --compression_ratio 0.50 \
    --tasks niah_multikey_2,niah_multikey_3
```

**Expected**:
| Task | ManifoldKV | KeyDiff | Δ |
|------|------------|---------|---|
| multikey_3 (50%) | **92.4%** | 77.0% | **+15.4** |
| multikey_2 (50%) | **99.8%** | 92.6% | **+7.2** |

---

### Claim 4: Universal ~9D Manifold Structure (Table 5)

**Claim**: Key vectors occupy ~9-dimensional manifold regardless of architecture.

```bash
# Run manifold dimension analysis
python scripts/analyze_manifold.py --model meta-llama/Meta-Llama-3.1-8B-Instruct
python scripts/analyze_manifold.py --model Qwen/Qwen3-8B
python scripts/analyze_manifold.py --model google/gemma-3-12b-it
```

**Expected**:
| Model | Head Dim | Two-NN Estimate |
|-------|----------|-----------------|
| Gemma-3-12B | 256 | **8.7 ± 2.3** |
| Qwen3-8B | 128 | **8.9 ± 0.9** |
| Ministral-8B | 128 | **8.2 ± 1.0** |
| Llama-3.1-8B | 128 | **~9** |

---

### Claim 5: Cross-Architecture Generalization (Table 6)

**Claim**: ManifoldKV achieves 94-96% across all architectures without tuning.

```bash
# Run multi-model experiments
./scripts/run_multimodel_experiments.sh

# Or individual models:
python scripts/run_ruler_main.py --model Qwen/Qwen3-8B
python scripts/run_ruler_main.py --model google/gemma-3-12b-it
python scripts/run_ruler_main.py --model mistralai/Ministral-8B-Instruct-2410
```

**Expected**:
| Model | 4K | 8K | 16K |
|-------|-----|-----|------|
| Gemma-3-12B | 95.2% | 94.4% | 95.2% |
| Qwen3-8B | 95.0% | 94.5% | 95.0% |
| Ministral-8B | 95.5% | 94.9% | 95.2% |

---

## Expected Results

### Main Benchmark Results (Llama-3.1-8B-Instruct)

| Method | Framework | Compression | Accuracy |
|--------|-----------|-------------|----------|
| **ManifoldKV** | AdaKV | 0.20 | **95.73%** |
| KeyDiff | AdaKV | 0.20 | 95.66% |
| SnapKV | AdaKV | 0.20 | 83.97% |
| KeyDiff | Standalone | 0.20 | 92.93% |

### 64K Context Recovery

| Method | Accuracy | vs KeyDiff |
|--------|----------|------------|
| Windowed-4K | **84.29%** | **+3.2** |
| Windowed-8K | 83.92% | +2.8 |
| Windowed-16K | 82.40% | +1.3 |
| KeyDiff | 81.09% | baseline |
| Global ManifoldKV | 35.2% | -45.9 |

---

## File Structure

```
icml_code_repo/
├── README.md                     # This file
├── pyproject.toml                # Package configuration
│
├── kvpress/                      # Core implementation (changes to kvpress)
│   ├── __init__.py               # Exports ManifoldKV presses
│   ├── presses/
│   │   ├── manifold_press.py     # ⭐ MAIN CONTRIBUTION: ManifoldKV variants
│   │   ├── adakv_press.py        # AdaKV wrapper (unchanged)
│   │   ├── keydiff_press.py      # KeyDiff baseline (unchanged)
│   │   ├── scorer_press.py       # Base class (unchanged)
│   │   └── ...                   # Other presses
│   └── pipeline.py               # KVPress pipeline
│
├── evaluation/                   # Evaluation infrastructure
│   ├── evaluate.py               # Main evaluation script
│   ├── evaluate_registry.py      # Press and dataset registries
│   └── benchmarks/
│       ├── ruler/                # RULER benchmark
│       └── longbench/            # LongBench benchmark
│
├── scripts/                      # Reproduction scripts
│   ├── run_ruler_experiments.sh  # Main RULER reproduction
│   ├── run_64k_windowed.py       # 64K windowed experiments
│   ├── run_multikey_ablation.py  # Multi-key retrieval ablation
│   ├── run_multimodel_experiments.sh # Cross-architecture
│   ├── analyze_manifold.py       # Manifold dimension analysis
│   └── analyze_latency.py        # Latency benchmarks
│
├── results/                      # Pre-computed results (for verification)
│   ├── ruler/                    # RULER benchmark results
│   ├── theory/                   # Theoretical analysis results
│   └── multimodel/               # Multi-model results
│
├── figures/                      # Paper figures
│   └── ...
│
└── notebooks/                    # Interactive demos
    ├── manifold_demo.ipynb       # ManifoldKV walkthrough
    └── visualization.ipynb       # Result visualization
```

---

## Detailed Reproduction Guide

### Step 1: Validate Installation

```bash
# Quick sanity check (should complete in ~1 minute)
python scripts/sanity_check.py
```

This runs a minimal test to verify:
- Model loads correctly
- ManifoldKV press works
- Evaluation pipeline functions

### Step 2: Reproduce Main Results (Table 1)

```bash
# Run full RULER benchmark across all context lengths
# Estimated time: 4-6 hours on single GPU

./scripts/run_ruler_experiments.sh

# Results will be saved to results/ruler/
```

### Step 3: Reproduce 64K Results (Table 2)

```bash
# Run 64K windowed experiments
# Requires 80GB+ GPU memory
# Estimated time: 8-10 hours

python scripts/run_64k_windowed.py --full

# For quick validation (100 samples):
python scripts/run_64k_windowed.py --max_samples 100
```

### Step 4: Run Multi-Model Validation (Table 6)

```bash
# Run across all models
# Estimated time: 12-18 hours total

./scripts/run_multimodel_experiments.sh

# Or parallel on 8 GPUs:
python scripts/launch_parallel.py --config configs/multimodel.yaml
```

### Step 5: Generate Paper Figures

```bash
# Generate all figures from results
python scripts/generate_figures.py --results_dir results/

# Figures saved to figures/
```

---

## Troubleshooting

### Common Issues

| Issue | Cause | Solution |
|-------|-------|----------|
| `torch.cuda.OutOfMemoryError` | GPU OOM | Reduce batch size or use model sharding |
| Model outputs garbage at 64K | AdaKV bug | Use `WindowedManifoldKVPress` directly |
| Accuracy ~35% at 64K | Centroid dilution | Use `WindowedManifoldKVPress(window_size=4096)` |
| `ModuleNotFoundError` | Wrong path | Run `pip install -e .` in icml_code_repo/ |

### Verifying Results

Each experiment saves:
- `config.yaml`: Experiment configuration
- `predictions.csv`: Raw model outputs
- `metrics.json`: Computed metrics

To verify a specific result:
```bash
cat results/ruler/ruler__4096__...__adakv_manifold_kv__0.20/metrics.json | python -m json.tool
```

---

## Citation

```bibtex
@inproceedings{manifoldkv2026,
  title={ManifoldKV: Geometry-Driven KV Cache Compression},
  author={Anonymous},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026}
}
```

---

## License

Apache 2.0

---

## Acknowledgments

Built on top of [kvpress](https://github.com/NVIDIA/kvpress) by NVIDIA Research.
