AgentKV: Phase-Aware KV Eviction for Agentic LLMs

Published: 01 Jun 2026, Last Modified: 11 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning, KV Cache, Agentic LLMs, Long Context
TL;DR: KV eviction method for agentic tasks
Abstract: Agentic workloads can consume on average 1000 times more tokens than chatbot workloads, stressing both KV-cache capacity and decode-time bandwidth. Existing query-centric eviction methods estimate key importance using representative queries drawn from recent tokens, an approach that works when future attention resembles recent attention. We show that agentic generation violates this assumption: its future query distribution is a mixture over think, act, tool, and others phases, whose components occupy measurably different query subspaces. As a result, recency representatives overrepresent the current phase and under-score keys needed by other phases. We propose AgentKV, a phase-aware eviction method that maintains a small set of queries per phase and scores cached keys against their union. AgentKV ranks first or tied first among compressed-cache methods in 78% BFCL tasks and 61% $\tau^2$-bench tasks, and improves output-token throughput by up to 1.80 times.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 147
Loading