# ManifoldKV: Expected Results

This document summarizes the expected results from the paper. Use these to verify your reproduction experiments.

---

## Table 1: Main RULER Benchmark Results (Llama-3.1-8B-Instruct)

| Method | Framework | Compression | RULER Accuracy |
|--------|-----------|-------------|----------------|
| **ManifoldKV** | AdaKV | 0.20 | **95.73%** |
| KeyDiff | AdaKV | 0.20 | 95.66% |
| SnapKV | AdaKV | 0.20 | 83.97% |
| KeyDiff | Standalone | 0.20 | 92.93% |

**Key Finding**: ManifoldKV achieves SOTA with AdaKV framework.

---

## Table 2: 64K Context Recovery

| Method | 64K Accuracy | vs Global | vs KeyDiff |
|--------|--------------|-----------|------------|
| **Windowed-4K** | **84.29%** | **+49.1** | **+3.2** |
| Windowed-8K | 83.92% | +48.7 | +2.8 |
| Windowed-16K | 82.40% | +47.2 | +1.3 |
| KeyDiff | 81.09% | N/A | baseline |
| Global ManifoldKV | 35.20% | baseline | -45.9 |

**Key Finding**: WindowedManifoldKV recovers 49 points from centroid dilution.

---

## Table 3: Compression Ratio Ablation

| Compression | Tokens Kept | RULER Accuracy |
|-------------|-------------|----------------|
| 0.10 | 90% | **95.76%** |
| 0.15 | 85% | **95.76%** |
| **0.20** | 80% | **95.73%** (recommended) |
| 0.25 | 75% | 95.64% |
| 0.30 | 70% | 95.45% |
| 0.40 | 60% | 94.75% |
| 0.50 | 50% | 92.02% |

**Key Finding**: ManifoldKV degrades gracefully with compression.

---

## Table 4: Multi-Key Retrieval (8K Context)

| Task | ManifoldKV | KeyDiff | Δ |
|------|------------|---------|---|
| **niah_multikey_3 (50%)** | **92.4%** | 77.0% | **+15.4** |
| **niah_multikey_2 (50%)** | **99.8%** | 92.6% | **+7.2** |
| niah_multikey_3 (40%) | 96.8% | 92.8% | +4.0 |
| niah_multikey_2 (40%) | 99.8% | 95.0% | +4.8 |

**Key Finding**: ManifoldKV excels at multi-key retrieval due to directional collision prevention.

---

## Table 5: Universal Manifold Structure

| Model | Head Dim | Two-NN Estimate | PCA (95%) |
|-------|----------|-----------------|-----------|
| Gemma-3-12B | 256 | **8.7 ± 2.3** | 160 (63%) |
| Qwen3-8B | 128 | **8.9 ± 0.9** | 81 (63%) |
| Ministral-8B | 128 | **8.2 ± 1.0** | 83 (65%) |
| Llama-3.1-8B | 128 | **~9** | ~80 (63%) |

**Key Finding**: Keys occupy a universal ~9D manifold regardless of architecture.

---

## Table 6: Cross-Architecture Generalization

| Model | 4K | 8K | 16K | Δ vs SnapKV |
|-------|-----|-----|------|-------------|
| Gemma-3-12B | 95.2% | 94.4% | 95.2% | +20.5 |
| Qwen3-8B | 95.0% | 94.5% | 95.0% | +7.6 |
| Ministral-8B | 95.5% | 94.9% | 95.2% | +12.6 |
| Llama-3.1-8B | 95.7% | 94.4% | 95.7% | +11.7 |

**Key Finding**: ManifoldKV achieves 94-96% across all architectures without tuning.

---

## Table 7: Distance Metric Ablation (Standalone)

| Metric | Accuracy | Δ vs Cosine |
|--------|----------|-------------|
| **L2 (Ours)** | **92.7%** | **+39.9** |
| L1 | 78.5% | +25.6 |
| L∞ | 71.2% | +18.4 |
| Cosine | 52.8% | baseline |

**Key Finding**: Any magnitude-aware metric outperforms cosine, with L2 being optimal.

---

## Latency Benchmarks (8K Context)

| Method | TTFT (ms) | Tokens/s | Peak Memory |
|--------|-----------|----------|-------------|
| No Compression | 26.6 | 37.6 | 16.9 GB |
| **ManifoldKV** | **26.4** | **37.8** | 16.7 GB |
| KeyDiff | 26.4 | 37.9 | 16.7 GB |
| SnapKV | 26.8 | 37.3 | 16.7 GB |

**Key Finding**: ManifoldKV adds <0.2ms overhead compared to no compression.

---

## Statistical Significance

All reported improvements are statistically significant:
- 5 random seeds per configuration
- Standard deviation < 0.3% across all runs
- Paired t-test for ManifoldKV vs KeyDiff at 64K: p < 10^-15

---

## Reproduction Verification

To verify your results match the expected values:

```bash
# Run quick validation
python scripts/sanity_check.py

# Check specific result
python -c "
import json
with open('results/ruler/..../metrics.json') as f:
    m = json.load(f)
avg = sum(v['string_match'] for v in m.values()) / len(m)
print(f'Accuracy: {avg:.2f}%')
"
```

Results should be within ±0.5% of the expected values due to random seed variation.
