Keywords: LLM serving, database
Abstract: Large language models (LLMs) rely on Key-Value (KV) cache to reduce time-
to-first-token (TTFT) latency, but existing disk-based KV cache systems using
file-per-object layouts suffer from severe scalability bottlenecks due to file system
metadata overhead, I/O inefficiency, and poor spatial locality. This paper presents
SGLANG-LSM, a database-inspired system that leverages Log-Structured Merge-
tree (LSM-tree) architectures for scalable KV cache management. SGLANG-LSM
implements a layered system design with three coordinated components: (1) a
prefix-preserving storage engine that maintains token sequence locality while
efficiently storing large KV cache tensors through key-value separation, (2) an
adaptive controller that dynamically optimizes LSM-tree configurations based on
shifting workload characteristics, and (3) runtime services including batch opera-
tions and automatic resource management for production deployment. Evaluation
on large-scale dynamic workloads demonstrates that SSGLANG-LSM significantly
improves cache hits by up to 143% and reduces TTFT by up to 24% compared to
state-of-the-art systems, representing the first systematic application of database
storage architectures to large-scale LLM cache management.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 17124
Loading