everyone
since 10 Jun 2025">EveryoneRevisionsBibTeXCC BY 4.0
Key-Value (KV) caching is a widely adopted technique in large language models (LLMs) to accelerate long-context inference. While recent studies predominantly focus on question-dependent KV cache eviction where cache entries are evicted based on known queries. In this paper, however, we observe these approaches often fail in question-independent scenarios, such as multi-turn dialogues and chunk pre-caching in retrieval-augmented generation (RAG), where future queries remain unknown. Our empirical analysis reveals that most existing KV cache eviction methods underperform in this setting due to their heavy reliance on importance metrics derived from question tokens. The core challenge here is to conduct well-founded estimation on token importance without access to future questions. To address this, we propose OracleKV for question-independent KV cache eviction. OracleKV operates by steering model's attention with an oracle guidance containing surface-level statistics of user preferences from large-scale real-world dialogues. Unlike existing methods, OracleKV operates at the data level, allowing seamless integration with other eviction algorithms in a plug-and-play manner. We evaluate OracleKV on both multi-turn and single-turn benchmarks, demonstrating its efficiency and effectiveness. Furthermore, we reveal the significant potential of data-level intervention in KV cache compression, expanding the design space of future research.