SafeKV: Safe KV-Cache Sharing in LLM Serving

Published: 21 May 2025, Last Modified: 17 Jun 2025MLArchSys 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Presentation: Virtual
Keywords: Security, KV-Cache, LLM Inference, Timing Side-Channel Attack
Presenter Full Name: Kexin Chu
Presenter Email: kexin.chu@uconn.edu
Abstract: Global KV cache sharing significantly improves the efficiency of LLM inference but introduces substantial privacy risks, while strict per-user cache isolation protects user data at the cost of reduced performance—adding 8–38.9% overhead in time-to-first-token (TTFT) on LLaMA2-70B in our experiments. To bridge this gap, we present SafeKV, a privacy-aware KV cache management system that enables selective sharing of non-sensitive cache entries while isolating sensitive ones in private caches. SafeKV integrates ChunkGuard, a lightweight, real-time detector that classifies sensitive content at the chunk level, with a decoupled cache architecture consisting of a batched Cache Search Engine, Allocator, Monitor, and Evictor. This design supports constant-time batched prefix lookups and enforces fine-grained privacy policies with minimal overhead. By combining privacy-preserving inference with high cache reuse efficiency, SafeKV restores the benefits of global sharing while providing strong runtime privacy guarantees.
Presenter Bio: https://scholar.google.com/citations?user=ZIdS3d0AAAAJ&hl=en
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
YouTube Link: https://youtu.be/SJqN4HY1HKQ
YouTube Link Poster: --
Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.
Google Slides: https://docs.google.com/presentation/d/1FVDrxewN-QArsBmSR3bMOebDcSDHRt3BjpDy5bMeHlU/edit?usp=sharing
Poster: No
Workshop Registration: Yes, the presenter has registered for the workshop.
YouTube Link Short: https://scholar.google.com/citations?user=ZIdS3d0AAAAJ&hl=en
Submission Number: 15
Loading