Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

Qizheng Zhang; Michael Wornow; Kunle Olukotun

Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents

Qizheng Zhang, Michael Wornow, Kunle Olukotun

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Caching, Memory, Serving, LLM Agents

Abstract: LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements. Existing LLM caching techniques (like context caching and semantic caching), primarily designed for serving chatbots, are insufficient for agent applications where outputs depend on external data and environmental contexts. We propose **Agentic Plan Caching (APC)**, a novel **test-time memory** that extracts, stores, adapts, and reuses structured plan templates from planning stages of agent applications across semantically similar tasks to reduce the cost and latency of serving. Unlike traditional semantic caching, our system extracts plan templates from completed agent executions at test-time, employs keyword extraction to match new requests against cached plans, and utilizes lightweight models to adapt these templates to task-specific plans with contexts. Evaluation across multiple real-world agent applications shows that our system can reduce costs by 50.31\% and latency by 27.28\% on average while maintaining performance, offering a more efficient solution for serving LLM-based agents that complements existing LLM serving infrastructures.

Primary Area: Infrastructure (e.g., libraries, improved implementation and scalability, distributed solutions)

Submission Number: 19073

Loading