KVpop - Retrofitting LLMs with xLSTM-guided Token Eviction

Lukas Hauzenberger; Niklas Schmidinger; Anamaria-Roberta Hartl; David Stap; Thomas Schmied; Sebastian Böck; Günter Klambauer; Sepp Hochreiter

KVpop - Retrofitting LLMs with xLSTM-guided Token Eviction

Lukas Hauzenberger, Niklas Schmidinger, Anamaria-Roberta Hartl, David Stap, Thomas Schmied, Sebastian Böck, Günter Klambauer, Sepp Hochreiter

Published: 01 Mar 2026, Last Modified: 05 Apr 2026TTU at ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0

Abstract: Key-value (KV) cache growth is a major bottleneck in autoregressive decoding, as memory and bandwidth scale linearly with the context length. Existing KV-eviction methods often rely on static heuristics or early retention decisions, which miss downstream context and cause brittle eviction as token relevance shifts. To address this, we introduce KVpop, which uses stateful delayed importance scoring at eviction time for context-aware cache management under a fixed per-head budget. KVpop is trained with a novel future-attention importance target that estimates long-term token utility without materializing dense attention. We show that KVpop preserves dense attention performance on challenging mathematical reasoning tasks, while reducing KV cache size by 75%. Even at ~94% KV compression, KVpop still retains ~80% of performance, almost doubling baseline recovery. Our results indicate that timing-aware eviction cuts KV memory costs while maintaining quality.

Submission Number: 64

Loading