Cache You Later: Post-Compression KV Repair for Long-Context Agentic LLM Inference

Published: 01 Jun 2026, Last Modified: 11 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: KV cache, KV cache compression, LLM inference, long-context language models, multi-turn LLM, efficient inference, agentic AI, attention
TL;DR: We propose RepairKV, a runtime mechanism which restores previously evicted (offloaded) tokens to the active KV cache when recent context makes them relevant again.
Abstract: As large language models (LLMs) support longer contexts and more capable agentic sessions, what matters in the accumulated context can change with each new query, tool output, or test result. At the same time, inference still depends on a limited active key-value (KV) cache, and many existing KV-cache compression methods decide which past tokens stay active from the current prefix. As a result, tokens that matter in later turns may already have been evicted or compressed away in earlier compression decisions. We argue the KV cache should be multi-tiered and governed by a two-way mechanism: both eviction and repair, not just eviction. We propose RepairKV, a runtime operator for multi-turn agentic inference that reevaluates which evicted KV rows are currently important and restores a budgeted subset from host memory back into the GPU KV cache. Controlled experiments show that RepairKV's restoration of evicted tokens improves retrieval accuracy, with the effect persisting across relevance changes and different initial eviction policies, and a preliminary diagnostic on open-source repositories shows the same pattern; on Qwen2.5-7B-Instruct at 32K context, RepairKV achieves 91.0% retrieval on a four-query needle-in-a-haystack task versus 24.5% for the matched no-repair baseline at the same active-cache budget, with only 96 promoted tokens.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 171
Loading