Towards a Collaborative Memory for Agentic Workflow: Breaking the Prefix Barrier with Segment-Level KV Cache Sharing
Keywords: KV Cache Sharing, Multi-Agent, Working Memory
Abstract: In LLMs-based multi-agent systems, the Key-Value (KV) cache serves as a critical carrier of agents' working memory, and its efficient reuse is paramount for enhancing the service throughput and inference efficiency. However, prevailing KV cache reuse methods rely heavily on a rigid prefix matching mechanism, which mandates exact equivalence between the query request and the cached prefix. This inflexible matching scheme struggles to accommodate the highly heterogeneous instruction prompt template in multi-agent environments, thereby severely constraining the overall system throughput. To overcome these limitations, this paper introduces a novel collaborative memory approach, underpinned by a Segment-Level KV Cache Sharing mechanism. This method decomposes the cache unit into fine-grained semantic segments, enabling agents to dynamically reuse KV cache segments generated by any other agent at arbitrary positions, without relying on sequential prefix consistency. Our approach not only significantly boosts the inference efficiency of LLMs in agentic workflows but also achieves genuine working memory sharing and collaboration, thereby enhancing cooperative capabilities among agents. Our implementation is built upon the vLLM framework and leverages the PageAttention mechanism. Extensive experimental results demonstrate that the proposed method markedly reduces redundant computation, increases system throughput, and even improves the performance of agentic workflow on benchmark tests through effective working memory sharing.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 5797
Loading