OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Go to
DBLP
homepage
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache
Zhenyu Zhang
,
Shiwei Liu
,
Runjin Chen
,
Bhavya Kailkhura
,
Beidi Chen
,
Atlas Wang
Published: 2024, Last Modified: 27 Sept 2024
MLSys 2024
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading