Minimal-Intervention KV Retention via Set-Conditioned Diversity

TMLR Paper8990 Authors

17 May 2026 (modified: 31 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: KV-cache compression at small budgets is a crowded design space spanning cache representation, head-wise routing, compression cadence, decoding behavior, and within-budget scoring. We study seven mechanisms across these five families on long-form mathematical reasoning (MATH-500~\citep{hendrycks2021math}) at budgets $b \in \{64, 128\}$, under an evaluation standard that tightened over the study and converged on matched mean cache with $n \geq 200$ on two distilled-reasoning models (Qwen-7B and Llama-8B variants of DeepSeek-R1-Distill~\citep{deepseek2025r1}). All seven were rejected as catalogue directions, one on screening grade evidence. We then propose $\alpha$, a one-function modification to the TriAttention~\citep{mao2026triattention} retention scorer that replaces argmax-top-$k$ with greedy facility-location-inspired selection under a V-space redundancy penalty controlled by a single weight $\lambda$. A pre-registered protocol tunes $\lambda$ on a frozen development split and confirms on a disjoint held-out split; with $\lambda = 0.5$, $\alpha$ clears Bonferroni on two of the four (model, budget) cells (Qwen $b{=}128$ and Llama $b{=}64$), no cell is significantly negative, and the pre-registered Branch~A triggers. The finding is asymmetric: the surviving mechanism was among the smallest tested, but minimality alone did not predict survival~--- two comparably small scoring modifications were also rejected~--- so what distinguished $\alpha$ was its set-conditioned selection rule, in which each retention decision depends on the already-retained set, rather than its size. The combined matched-memory, sympy-graded, held-out confirmation protocol is the evidence standard that made the asymmetry visible.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yingnian_Wu1
Submission Number: 8990
Loading